aws s3 | The Blog Pros

March 20, 2022

Query S3 With SQL Using S3 Select

S3 Select is an AWS S3 feature that allows developers to run SQL queries on objects in S3 buckets. Here's an example.

     SQL
    
    SELECT s.zipcode, s.id FROM s3object s where s.name = 'Harshil'

Previously we wrote about the different ways you can write SQL with AWS. In this article, we will see how to configure and use S3 Select to make working with big datasets easier.

November 1, 2021

AWS CloudWatch + yCrash = Monitoring + RCA

We had an outage in our online application GCeasy on Monday morning (PST) Oct 11, 2021. When customers uploaded their Garbage Collection logs for analysis, the application was returning an HTTP 504 error. HTTP 504 status code indicates that transactions are timing out. In this post, we would like to document our journey to identify the root cause of the problem.

Application Stack

Here are the primary components of the technology stack of the application:

June 25, 2021

How Carbon Uses PrestoDB With Ahana to Power Real-Time Customer Dashboards

The author, Jordan Hoggart, was not compensated by Ahana for this review.

The Background

At the base of Carbon’s real-time, first-party data platform is our analytics component, which combines a range of behavioral, contextual, and revenue data, which is then displayed within a dashboard in a series of charts, graphs, and breakdowns to give a visual representation of the most important actionable data. Whilst we pre-calculate as much of the information as possible, there are different filters that allow users to drill deeper into the data, which makes querying critical.

May 6, 2021

AWS S3 Client-side Encryption in AWS SDK .NET Core

When you upload the data into the S3 bucket, you need to ensure that the sensitive data is secure by using proper encryption. Amazon S3 allows encrypting the data or objects either on the server-side or client-side.

Here, I will use client-side encryption for data before sending it to Amazon S3 using AWS SDK .Net Core. The advantage of client-side encryption is, encryption is performed locally, and the data never leaves the execution environment unencrypted. Another advantage is you can use your master keys for encryption, and no one can access your data without having your master encryption keys.

January 16, 2021

Using Server-Side Encrypt Data AWS KMS to Integrate With Mule-4 AWS-S3 Connector

Purpose

To Demonstrate MuleSoft integration with S3 Bucket with enabled KMS (Server Side Encryption).

What is AWS KMS?
Key Rotation; AWS configuration for KMS and S3 Bucket.
Mule 4 connector configuration.
Tutorial video.

Scenarios

Publish data to the S3 bucket while the bucket is enabled with server-side encryption.
Publish data to the S3 bucket as the bucket is disabled with server-side encryption.

What Is AWS KMS?

AWS Key Management Service (AWS KMS) is a regulated service that makes it easy to produce and manage the encryption keys utilized to encrypt data.

January 15, 2021

Scan an AWS S3 File for Viruses in Java

The increased use of cloud storage is also increasing the attention it gets from potential cyber attackers. End-users are able to upload viruses, and attackers can craft specialized attack malware and upload this content as well. Once these threats are uploaded, they can flow through your systems, hiding themselves in cloud storage or databases, and could eventually get executed.

Consider the following situation: an attacker uploads a custom executable file into a financial company’s cloud storage database, and the system accepts it. The virus is missed by the company’s minimal virus scan software, so it continues to infiltrate other critical business applications. Eventually, it’s downloaded by a financial manager, resulting in an endpoint being infected with an Advanced Persistent Threat (APT).

November 24, 2020

Exposed AWS Secret Access Key To GitHub Can Be a Costly Affair – A Personal Experience

I would like to share an experience which is related to securely storing access keys and billing of AWS cloud account.

6 years back, I have started using AWS Cloud services for one of our project requirements. It was an excitement to start working on the cloud. We started exploring and using different services. For one of the use case, we need to store some documents which should be secure, easily managed, and will be growing as the project feature will grow. We opted for AWS Simple Storage Service.

September 3, 2020

Reading AWS S3 File Content to Kafka Topic

Apache Camel

Apache Camel is an open-source framework for message-oriented middleware with a rule-based routing and mediation engine that provides a Java object-based implementation of the Enterprise Integration Patterns using an application programming interface to configure routing and mediation rules

Red Hat AMQ Streams

Red Hat AMQ Streams is a massively-scalable, distributed, and high-performance data streaming platform based on the Apache ZooKeeper and Apache Kafka projects.

August 18, 2020

Upload Files to Google Cloud Storage with Python

Google Cloud is a suite of cloud-based services just like AWS from Amazon and Azure from Microsoft. AWS dominates the market with Azure but Google's not far behind. Google Cloud Platform or GCP is the third largest cloud computing platform in the world, with a share of 9% closely followed by Alibaba Cloud.

Amazon undoubtedly leads the market with a share of 33% but GCP is showing tremendous spike with the growth rate of whooping 83% in 2019. GCP leads AWS on the cost front, though. Google has a lesser number of services to offer but maintains its position as one of the most cost-effective cloud platform.

August 11, 2020

Reducing Large S3 API Costs Using Alluxio

I. Introduction

Previous Works

There have been numerous articles and online webinars dealing with the benefits of using Alluxio as an intermediate storage layer between the S3 data storage and the data processing system used for ingestion or retrieval of data (i.e. Spark, Presto), as depicted in the picture below:

To name a few use cases:

March 25, 2020

Spring2quarkus — Spring Boot to Quarkus Migration

Time to boot up your Spring with Quarkus.

Recently the "fattest" of my Spring Boot based microservices became too big. The entire project was hosted on the AWS EC2 environment and the instances used were t2.micro or t3.micro. The service started to get down very often even with minimum load on it. The obvious option was to choose a bigger instance for the service (t3.small) which I did initially.

November 19, 2019

Enforcing and Monitoring Security on AWS S3

I am an avid follower of AWS Online Tech Talks YouTube channel. It is a useful way to stay up-to-date on new or existing AWS features and services; I find it helpful to refresh and retain knowledge. Recently, I encountered a webinar about AWS S3 security, which triggered me to relook at my S3 policies and settings. I decided to consolidate some S3 security features and properties. In this article, I'll discuss the changes I made, along with some examples and my two cents.

What’s the Incentive?

Typically, in my day-to-day use of S3, security and permissions are not being changed regularly. In most cases, we set the security definitions at the time the S3 bucket is created and then forget about it. We do not bother to revalidate these security settings periodically.

November 8, 2019

Introducing Wormhole: Fast Dockerized Presto and Alluxio Setups

Just like a real wormhole, this tool is all about speed.

This blog introduces Wormhole, an open-source Dockerized solution for deploying Presto and Alluxio clusters for blazing-fast analytics on file system (we use S3, GCS, OSS). When it comes to analytics, generally people are hands-on in writing SQL queries and love to analyze data that resides in a warehouse (e.g. MySQL database). But as data grows, these stores start failing and there arises a need for getting the faster results in the same or a shorter time frame. This can be solved by distributed computing and Presto is designed for that. When attached to Alluxio, it works even more, faster. That’s what Wormhole is all about.

You may also enjoy: Alluxio Cluster Setup Using Docker

Here is the high-level architecture diagram of solution:

August 26, 2019

Running Alluxio-Presto Sandbox in Docker

The Alluxio-Presto sandbox is a Docker application featuring installations of MySQL, Hadoop, Hive, Presto, and Alluxio. The sandbox lets you easily dive into an interactive environment where you can explore Alluxio, run queries with Presto, and see the performance benefits of using Alluxio in a big data software stack.

In this guide, we’ll be using Presto and Alluxio to showcase how Alluxio can improve Presto’s query performance by caching our data locally so that it can be accessed at memory speed!

June 20, 2019

Distributed Data Querying With Alluxio

This blog is about how I used Alluxio to reduce p99 and p50 query latencies and optimized the overall platform costs for a distributed querying application. I walk through the product and architecture decisions that lead to our final architecture, discuss the tradeoffs, share some statistics on the improvements, and discuss future improvements to the system.

Category: aws s3

Query S3 With SQL Using S3 Select

AWS CloudWatch + yCrash = Monitoring + RCA

Application Stack

How Carbon Uses PrestoDB With Ahana to Power Real-Time Customer Dashboards

The Background

AWS S3 Client-side Encryption in AWS SDK .NET Core

AWS S3 Client-side Encryption in AWS SDK .NET Core

Using Server-Side Encrypt Data AWS KMS to Integrate With Mule-4 AWS-S3 Connector

Purpose

Table of Contents

Scenarios

What Is AWS KMS?

Scan an AWS S3 File for Viruses in Java

Exposed AWS Secret Access Key To GitHub Can Be a Costly Affair – A Personal Experience

Reading AWS S3 File Content to Kafka Topic

Apache Camel

Red Hat AMQ Streams

Upload Files to Google Cloud Storage with Python

Reducing Large S3 API Costs Using Alluxio

I. Introduction

Previous Works

Spring2quarkus — Spring Boot to Quarkus Migration

Enforcing and Monitoring Security on AWS S3

What’s the Incentive?

Introducing Wormhole: Fast Dockerized Presto and Alluxio Setups

Running Alluxio-Presto Sandbox in Docker

Distributed Data Querying With Alluxio

Description