HDFS | The Blog Pros

February 21, 2019

Testing Distributed Systems With Docker and AWS for the Cost of a Large Pizza

Testing distributed systems at scale is typically a costly yet necessary process. At Alluxio, we take testing very seriously as organizations across the world rely on our technology, therefore, a problem we want to solve is how to test at scale without breaking the bank. In this blog, we are going to show how the maintainers of the Alluxio open source project build and test our system at scale cost-effectively using public cloud infrastructure. We test with the most popular frameworks, such as Spark and Hive, and pervasive storage systems, such as HDFS and S3. Using Amazon AWS EC2, we are able to test 1000+ worker clusters, at a cost of about $16 per hour.

Read the full-length Technical White Paper if you are interested in the following takeaways as this blog is an abbreviated version:

January 11, 2019July 10, 2019

Top 10 Tips for Making the Spark + Alluxio Stack Blazing Fast

The Apache Spark + Alluxio stack is getting quite popular particularly for the unification of data access across S3 and HDFS. In addition, compute and storage are increasingly being separated causing larger latencies for queries. Alluxio is leveraged as compute-side virtual storage to improve performance. But to get the best performance, like any technology stack, you need to follow the best practices. This article provides the top 10 tips for performance tuning for real-world workloads when running Spark on Alluxio with data locality, giving the most bang for the buck.

A Note on Data Locality

High data locality can greatly improve the performance of Spark jobs. When data locality is achieved, Spark tasks can read in-Alluxio data from local Alluxio workers at memory speed (when ramdisk is configured) instead of transferring the data over the network. The first few tips are related to locality.

The Art of Manual Regression Testing
No categories
The tech world of software development is characterized by fast-paced and constant evolution. Code keeps changing, new features are introduced, and bugs are fixed frequently. These changes are crucial for improving the overall development structure. Ho... […]
Understanding Properties of Zero Trust Networks
No categories
Zero Trust is a well-known but 'hard-to-implement' paradigm in computer network security. As the name suggests, Zero Trust is a set of core system design principles and concepts that seek to eliminate the practice of implicit trust-based security. The ... […]
Mastering Distributed Caching on AWS: Strategies, Services, and Best Practices
No categories
Distributed caching is a method for storing and managing data across multiple servers, ensuring high availability, fault tolerance, and improved read/write performance. In cloud environments like AWS (Amazon Web Services), distributed caching is pivota... […]
Step-By-Step Guide To Crafting an Effective Bug Report
No categories
Bugs are an integral part of the development process. Along with the bugs you need to write a bug report. So in this blog post, we are sharing some effective tips and tricks to write bug reports. Bugs are bound to happen when you’re developing an ... […]
Testing Mobile Apps: Step-By-Step Guide
No categories
Technology is growing every day and so is the use of mobile phones in our lives. For as little as ordering groceries for our homes to handling our finances through banking and other financial apps, mobile applications are an integral part of our daily ... […]