Moving from Physical Tapes to Amazon Glacier
Challenge
iN DEMAND, a premier distributor of on-demand videos, was still storing hundreds of thousands of physical tapes and manually retrieving them. They needed a secure digital archiving system that could integrate with Amazon Glacier.
Show
iN DEMAND, a premier distributor for Pay-Per-View and Video-on-Demand content, serves as an intermediary between media companies, content providers and cable companies. iN DEMAND contacted Caktus to create a custom internal web application that would replace their tape-based movie archiving and restoration process.
Their old system required backups of hundreds of thousand tapes for storage and then manual retrieval of each movie when requested. Because tapes corrupt over time, every movie file had to be duplicated and then stored both offsite and onsite. Even with onsite storage, when a customer wanted a specific movie, retrieving it from the archive was both costly and time-intensive. As a manual process, after-hour requests often had to wait a full day to be fulfilled until staff were available. Even then, the specific content metadata associated with a movie—its duration, aspect ratio, language version, rating and more—was stored in completely separate systems which added more time and cost to each retrieval.
Caktus was tasked with developing an automated replacement to this archive and retrieval system that would speed up fulfillment of client demands. iN DEMAND’s request called for custom integration with Amazon Glacier, the low-cost, cloud-based storage service for infrequently used data, and to do so using the just-released boto python library.
Solution
Caktus developed a Django web application, MiniDAM, to allow iN DEMAND to securely and reliably move their archive to Amazon Glacier and monitor uploads and downloads.
Show
Caktus developed a web application named “MiniDAM” (“mini” data asset manager). MiniDAM monitors the files that exist on a network drive. This network drive acts as the staging ground to move from tape archive to Amazon Glacier. Movies, closed captioning tracks and other associated files are placed in a queue for metadata analysis, encryption, and compression before being uploaded via the boto library to Amazon Glacier for secure and cost-efficient storage. To meet customer orders, iN DEMAND staff then download each movie from Glacier as needed. A custom web-based dashboard was also created so that iN DEMAND staff could monitor and search the status of all uploads and downloads.
Planning Process
To succeed, the MiniDAM tool needed to successfully integrate with Glacier, yet at the start of the project the proposed interface with Glacier—the boto library—had only just been released. This new system also needed to reach configurable upload and download speeds in excess of 150-200 Mb/sec for upload and 250 MB/sec for retrieval in order to utilize the correct amount of available bandwidth to be cost effective.
The Caktus development team started by writing custom scripts to test how effectively the boto python library would integrate with Glacier. Working closely with the iN DEMAND infrastructure team and the Glacier team at Amazon, Caktus was able to pinpoint a multi-threaded solution—a process of breaking up data into separate pieces to upload and then reassembling them upon arrival—previously untried in boto. This enhancement allowed the tool to produce simultaneous uploads and downloads that met the benchmark speeds needed for the project.
Caktus was able to pinpoint a multi-threaded solution—a process of breaking up data into separate pieces to upload and then reassembling them upon arrival—previously untried in boto.
Development and Testing
Now we had to turn our test scripts into a functioning tool. The first step was completing the background tasks: making upload and download work. Every task in this process required a precise sequence of steps to work and each file needed to make the most effective use of the available bandwidth to be cost effective. Our developers relied heavily on Celery for keeping all tasks in the correct sequence.
Multiple layers of required encryption by the RIAA and MPAA were also required for each file prior to upload with additional layers of decryption after download. Scripts were written that uploaded each encrypted file, decrypting them first, then breaking them up into separate chunks of data to be recompiled and re-encrypted once they were on Glacier. The same process was then repeated for download. Due to the amount of available bandwidth, breaking up the files allowed iN DEMAND to upload and download files concurrently and more quickly. Rigorous unit testing was also applied to each functional component, vetting the system for clean functioning code as it was built.
The final puzzle to fix was resolving how to create multi-threaded, resumable uploads and downloads. Transferring a seven gigabyte movie to and from Glacier takes hours end-to-end once all of the checks and encryption monitored by the Celery process have been applied. Any number of errors can happen in any transfer, ranging from network failures to even a minor flaw in one movie file’s metadata.
To make MiniDAM reliable, the file transfer needed to be able to gracefully restart if an error occurred. Multiple monitoring scripts were created to review the status of files in process and refresh the troubled files from the queue of scheduled tasks, restarting its transfer to archives. Because this new service needed to be cost effective, we then created the online dashboard so iN DEMAND staff could check speed benchmarks and see that they weren’t exceeding their Glacier cost limits.
Results
MiniDAM now uploads up to 2 Terabytes of data daily with plans to increase to 6.5 Terabytes. The security, ease, and reliability of MiniDAM led to client recognition by the greater Amazon Web Services community.
Show
The MiniDAM tool and its associated dashboard launched in early April 2013 and not only successfully replaced the manual process of movie storage and retrieval but now regularly uploads up to 2 Terabytes of data daily to Glacier and no longer requires file duplication. Future planned versions will upload 6.5 Terabytes a day. When a customer needs a particular movie, in a specific language and at a set bit rate, a simple search can be made to find the precise movie version and activate its download. Whole libraries of a particular artist’s work can now be searched for online from the archive and shown to interested clients for potential distribution.
The application’s success led to client recognition at AWS re:Invent, the largest gathering of the global Amazon Web Services community. MiniDAM became a best practice example to the greater Amazon.com community. Watch the presentation here.
Caktus and iN DEMAND have continued this partnership with additional contracts to build key features as new edge cases emerged. At iN DEMAND’s request, we have added a series of additional reporting features refining the view of aggregate cumulative daily amount of data uploaded to Glacier and we have also added support for exporting data sets in CSV and PDF. Work is currently underway to scale the core functions of MiniDAM far beyond its current capacity to meet growing bandwidth availability.