Geek Reading – Cloud, SQL, NoSQL, HTML5

I have talked about human filters and my plan for digital curation. These items are the fruits of those ideas, the items I deemed worthy from my Google Reader feeds. These items are a combination of tech business news, development news and programming tools and techniques.

I hope you enjoy today’s items, and please participate in the discussions on those sites.

DynamoDB Partition Key Strategies for SaaS

Amazon DynamoDB is a fully managed NoSQL database service built for scalability and high performance. It’s one of the most popular databases used at SaaS companies.. We selected DynamoDB for the same reasons as everyone else: autoscaling, low cost, zero downtime. However, at scale, DynamoDB can present serious performance issues.

SaaS applications commonly follow a multi-tenant architecture, which means every customer receives a single instance of the software. At scale, this can often lead to hotkey problems due to an uneven partitioning of data in Amazon DynamoDB, which can be resolved with two solutions that will allow the system to scale. When using Amazon DynamoDB for a multi-tenant solution, you need to know how to effectively partition the tenant data in order to prevent performance bottlenecks as the application scales over time.

5 CDK Lessons Learned

The AWS Cloud Development Kit (CDK) allows you to define your AWS resources using the programming languages you know and love. This concept piqued the interest of many of us here at Instil; when someone offers us the ability to use Typescript instead of YAML we’re sold!

I have been using CDK for the past 3 years for container-based and serverless projects, and what I think is CDK’s greatest strength are the guard rails it provides to the developer:

AWS DynamoDB Table Design in 10 Minutes

AWS DynamoDB is a serverless key and value pair NOSQL database. The schema design concept is different from traditional relational database. You can understand the DynamoDB table design within 10 minutes in this video.

Thank you for watching!

CRUD with DynamoDB Using Serverless and NodeJS

Introduction

In this post, we are going to see how to make a CRUD application using DynamoDB, AWS Serverless, and NodeJS, we will cover all CRUD operations like DynamoDB GetItem, PutItem, UpdateItem, DeleteItem, and list all the items in a table. Everything will be done using the AWS Serverless framework and on NodeJS, this is part 1 of this series, in part 2 we are going to add authentication to this application, for now, let’s get started.

Project Setup

Our project folder structure will look like this

Stopping Cybersecurity Threats: Why Databases Matter

From intrusion detection to threat analysis to endpoint security, the effectiveness of cybersecurity efforts often boils down to how much data can be processed in real-time with the most advanced algorithms and models.

Many factors are obviously involved in stopping cybersecurity threats effectively. However, the databases responsible for processing the billions or trillions of events per day (from millions of endpoints) play a particularly crucial role. High throughput and low latency directly correlate with better insights as well as more threats discovered and mitigated in near real-time. Cybersecurity data-intensive systems are incredibly complex: many span 4+ data centers with database clusters exceeding 1000 nodes and petabytes of heterogeneous data under active management.

DynamoDB Autoscaling Dissected: When a Calculator Beats a Robot

TLDR; Choosing the Right Mode for DynamoDB Scaling

Making sense of the multitude of scaling options available for DynamoDB can be quite confusing, but running a short checklist with a calculator can go a long way to help.

  1. Follow the flowchart below to decide which mode to use.
  2. If you have historical data of your database load (or an estimate of load pattern), create a histogram or a percentile curve of the load (aggregate on hours used) – this is the easiest way to observe how many reserved units to pre-purchase. As a rule of thumb purchase reservations for units used over 32% of the time when accounting for partial usage and 46% of the time when not accounting for partial usage.
  3. When in doubt, opt for static provisioning unless your top priority is avoiding being out of capacity – even at extreme costs.
  4. Configure scaling limits (both upper and lower) for provisioned autoscaling. You want to avoid out-of-capacity during outages and extreme costs in case of rogue overload (DDOS anyone?)
  5. Remember that there is no upper limit on DynamoDB on-demand billing other than the table’s scaling limit (which you may have requested raising for performance reasons). Make sure to configure billing alerts and respond quickly when they fire.

The Long Version: Configuring DynamoDB Tables

Before we dive in, it’s useful to be reminded of DynamoDB different service models and their scaling characteristics: DynamoDB tables can be configured to be either “provisioned capacity” or “on-demand”, and there’s a cooldown period of 24 hours before you can change again.

Useful Tools for Local Development With AWS Services

Over the last 2.5 years, I've been working with AWS and a wide range of its services. During this time, I noticed that for most projects, it's useful to be able to test your application against AWS services without having to deploy or move your code into the cloud. There are several free solutions available for you to use depending on the services required by your project. In this post, I'll describe some of the tools that I use.

DynamoDB Local

At one of my previous projects, we made extensive use of the combination of DynamoDB and Elasticsearch for storing and querying data. The fact that DynamoDB is a managed database service with immense scale and performance benefits makes DynamoDB a great fit for high traffic applications.

How to Make GraphQL and DynamoDB Play Nicely Together

Serverless, GraphQL, and DynamoDB are a powerful combination for building websites. The first two are well-loved, but DynamoDB is often misunderstood or actively avoided. It’s often dismissed by folks who consider it only worth the effort “at scale.”

That was my assumption, too, and I tried to stick with a SQL database for my serverless apps. But after learning and using DynamoDB, I see the benefits of it for projects of any scale.

To show you what I mean, let’s build an API from start to finish — without any heavy Object Relational Mapper (ORM) or GraphQL framework to hide what is really going on. Maybe when we’re done you might consider giving DynamoDB a second look. I think it is worth the effort.

The main objections to DynamoDB and GraphQL

The main objection to DynamoDB is that it is hard to learn, but few people argue about its power. I agree the learning curve feels very steep. But SQL databases are not the best fit with serverless applications. Where do you stand up that SQL database? How do you manage connections to it? These things just don’t mesh with the serverless model very well. DynamoDB is serverless-friendly by design. You are trading the up-front pain of learning something hard to save yourself from future pain. Future pain that only grows if your application grows.

The case against using GraphQL with DynamoDB is a little more nuanced. GraphQL seems to fit well with relational databases partly because it is assumed by a lot of the documentation, tutorials, and examples. Alex Debrie is a DynamoDB expert who wrote The DynamoDB Book which is a great resource to deeply learn it. Even he recommends against using the two together, mostly because of the way that GraphQL resolvers are often written as sequential independent database calls that can result in excessive database reads.

Another potential problem is that DynamoDB works best when you know your access patterns beforehand. One of the strengths of GraphQL is that it can handle arbitrary queries more easily by design than REST. This is more of a problem with a public API where users can write arbitrary queries. In reality, GraphQL is often used for private APIs where you control both the client and the server. In this case, you know and can control the queries you run. With a GraphQL API it is possible to write queries that clobber any database without taking steps to avoid them.

A basic data model

For this example API, we will model an organization with teams, users, and certifications. The entity relational diagram is shown below. Each team has many users and each user can have many certifications.

Relational database model

Our end goal is to model this data in a DynamoDB table, but if we did model it in a SQL database, it would look like the following diagram:

To represent the many-to-many relationship of users to certifications, we add an intermediate table called “Credential.” The only unique attribute on this table is the expiration date. There would be other attributes for each of the tables, but we reduce it to just a name for each for simplicity.

Access patterns

The key to designing a data model for DynamoDB is to know your access patterns up front. In a relational database you start with normalized data and perform joins across the data to access it. DynamoDB does not have joins, so we build a data model that matches how we intend to access it. This is an iterative process. The goal is to identify the most frequent patterns to start. Most of these will directly map to a GraphQL query, but some may be only used internally to the back end to authenticate or check permissions, etc. An access pattern that is rarely used, like a check run once a week by an administrator, does not need to be designed. Something very inefficient (like a table scan) can handle these queries.

Most frequently accessed:

  • User by ID or name
  • Team by ID or name
  • Certification by ID or name

Frequently accessed:

  • All Users on a Team by Team ID
  • All Certifications for a given User
  • All Teams
  • All Certifications

Rarely accessed

  • All Certifications of users on a Team
  • All Users who have a Certification
  • All Users who have a Certification on a Team

DynamoDB single table design

DynamoDB does not have joins and you can only query based on the primary key or predefined indexes. There is no set schema for items imposed by the database, so many different types of items can be stored in a single table. In fact, the recommended best practice for your data schema is to store all items in a single table so that you can access related items together with a single query. Below is a single table model representing our data. To design this schema, you take the access patterns above and choose attributes for the keys and indexes that match.

The primary key here is a composite of the partition/hash key (pk) and the sort key (sk). To retrieve an item in DynamoDB, you must specify the partition key exactly and either a single value or a range of values for the sort key. This allows you to retrieve more than one item if they share a partition key. The indexes here are shown as gsi1pk, gsi1sk, etc. These generic attribute names are used for the indexes (i.e. gsi1pk) so that the same index can be used to access different types of items with different access pattern. With a composite key, the sort key cannot be empty, so we use “#” as a placeholder when the sort key is not needed.

Access patternQuery conditions
Team, User, or Certification by ID  Primary Key, pk=”T#”+ID, sk=”#”  
Team, User, or Certification by nameIndex GSI 1, gsi1pk=type, gsi1sk=name
All Teams, Users, or Certifications  Index GSI 1, gsi1pk=type    
All Users on a Team by IDIndex GSI 2, gsi2pk=”T#”+teamID
All Certifications for a User by IDPrimary Key, pk=”U#”+userID, sk=”C#”+certID
All Users with a Certification by IDIndex GSI 1, gsi1pk=”C#”+certID, gsi1sk=”U#”+userID

Database schema

We enforce the “database schema” in the application. The DynamoDB API is powerful, but also verbose and complicated. Many people jump directly to using an ORM to simplify it. Here, we will directly access the database using the helper functions below to create the schema for the Team item.

const DB_MAP = {
  TEAM: {
    get: ({ teamId }) => ({
      pk: 'T#'+teamId,
      sk: '#',
    }),
    put: ({ teamId, teamName }) => ({
      pk: 'T#'+teamId,
      sk: '#',
      gsi1pk: 'Team',
      gsi1sk: teamName,
      _tp: 'Team',
      tn: teamName,
    }),
    parse: ({ pk, tn, _tp }) => {
      if (_tp === 'Team') {
        return {
          id: pk.slice(2),
          name: tn,
          };
        } else return null;
        },
    queryByName: ({ teamName }) => ({
      IndexName: 'gsi1pk-gsi1sk-index',
      ExpressionAttributeNames: { '#p': 'gsi1pk', '#s': 'gsi1sk' },
      KeyConditionExpression: '#p = :p AND #s = :s',
      ExpressionAttributeValues: { ':p': 'Team', ':s': teamName },
      ScanIndexForward: true,
    }),
    queryAll: {
      IndexName: 'gsi1pk-gsi1sk-index',
      ExpressionAttributeNames: { '#p': 'gsi1pk' },
      KeyConditionExpression: '#p = :p ',
      ExpressionAttributeValues: { ':p': 'Team' },
      ScanIndexForward: true,
    },
  },
  parseList: (list, type) => {
    if (Array.isArray(list)) {
      return list.map(i => DB_MAP[type].parse(i));
    }
    if (Array.isArray(list.Items)) {
      return list.Items.map(i => DB_MAP[type].parse(i));
    }
  },
};

To put a new team item in the database you call:

DB_MAP.TEAM.put({teamId:"t_01",teamName:"North Team"})

This forms the index and key values that are passed to the database API. The parse method takes an item from the database and translates it back to the application model.

GraphQL schema

type Team {
  id: ID!
  name: String
  members: [User]
}
type User {
  id: ID!
  name: String
  team: Team
  credentials: [Credential]
}
type Certification {
  id: ID!
  name: String
}
type Credential {
  id: ID!
  user: User
  certification: Certification
  expiration: String
}
type Query {
  team(id: ID!): Team
  teamByName(name: String!): [Team]
  user(id: ID!): User
  userByName(name: String!): [User]
  certification(id: ID!): Certification
  certificationByName(name: String!): [Certification]
  allTeams: [Team]
  allCertifications: [Certification]
  allUsers: [User]
}

Bridging the gap between GraphQL and DynamoDB with resolvers

Resolvers are where a GraphQL query is executed. You can get a long way in GraphQL without ever writing a resolver. But to build our API, we’ll need to write some. For each query in the GraphQL schema above there is a root resolver below (only the team resolvers are shown here). This root resolver returns either a promise or an object with part of the query results.

If the query returns a Team type as the result, then execution is passed down to the Team type resolver. That resolver has a function for each of the values in a Team. If there is no resolver for a given value (i.e. id), it will look to see if the root resolver already passed it down.

A query takes four arguments. The first, called root or parent, is an object passed down from the resolver above with any partial results. The second, called args, contains the arguments passed to the query. The third, called context, can contain anything the application needs to resolve the query. In this case, we add a reference for the database to the context. The final argument, called info, is not used here. It contains more details about the query (like an abstract syntax tree).

In the resolvers below, ctx.db.singletable is the reference to the DynamoDB table that contains all the data. The get and query methods directly execute against the database and the DB_MAP.TEAM.... translates the schema to the database using the helper functions we wrote earlier. The parse method translates the data back to the from needed for the GraphQL schema.

const resolverMap = {
  Query: {
    team: (root, args, ctx, info) => {
      return ctx.db.singletable.get(DB_MAP.TEAM.get({ teamId: args.id }))
        .then(data => DB_MAP.TEAM.parse(data));
    },
    teamByName: (root, args, ctx, info) =>; {
      return ctx.db.singletable
        .query(DB_MAP.TEAM.queryByName({ teamName: args.name }))
        .then(data => DB_MAP.parseList(data, 'TEAM'));
    },
    allTeams: (root, args, ctx, info) => {
      return ctx.db.singletable.query(DB_MAP.TEAM.queryAll)
        .then(data => DB_MAP.parseList(data, 'TEAM'));
    },
  },
  Team: {
    name: (root, _, ctx) => {
      if (root.name) {
        return root.name;
      } else {
        return ctx.db.singletable.get(DB_MAP.TEAM.get({ teamId: root.id }))
          .then(data => DB_MAP.TEAM.parse(data).name);
      }
    },
    members: (root, _, ctx) => {
      return ctx.db.singletable
        .query(DB_MAP.USER.queryByTeamId({ teamId: root.id }))
        .then(data => DB_MAP.parseList(data, 'USER'));
    },
  },
  User: {
    name: (root, _, ctx) => {
      if (root.name) {
        return root.name;
      } else {
        return ctx.db.singletable.get(DB_MAP.USER.get({ userId: root.id }))
          .then(data => DB_MAP.USER.parse(data).name);
      }
    },
    credentials: (root, _, ctx) => {
      return ctx.db.singletable
        .query(DB_MAP.CREDENTIAL.queryByUserId({ userId: root.id }))
        .then(data =>DB_MAP.parseList(data, 'CREDENTIAL'));
    },
  },
};

Now let’s follow the execution of the query below. First, the team root resolver reads the team by id and returns id and name. Then the Team type resolver reads all the members of that team. Then the User type resolver is called for each user to get all of their credentials and certifications. If there are five members on the team and each member has five credentials, that results in a total of seven reads for the database. You could argue that is too many. In a SQL database this might be reduced to four database calls. I would argue that the seven DynamoDB reads will be cheaper and faster than the four SQL reads in many cases. But this comes with a big dose of “it depends” on a lot of factors.

query { team( id:"t_01" ){
  id
  name
  members{
    id
    name
    credentials{
      id
      certification{
        id
        name
      }
    }
  }
}}

Over-fetching and the N+1 problem

Optimizing a GraphQL API involves balancing a whole lot of tradeoffs that we won’t get into here. But two that weigh heavily in the decision of DynamoDB versus SQL are over-fetching and the N+1 problem. In many ways, these are opposite sides of the same coin. Over-fetching is when a resolver requests more data from the database than it needs to respond to the query. This often happens when you try to make one call to the database in the root resolver or a type resolver (e.g., members in the Team type resolver above) to get as much of the data as you can. If the query did not request the name attribute, it can be seen as wasted effort.

The N+1 problem is almost the opposite. If all the reads are pushed down to the lowest level resolver, then the team root resolver and the members resolver (for Team type) would make only a minimal or no request to the database. They would just pass the IDs down to the Team type and User type resolver. In this case, instead of members making one call to get all five members, it would push down to User to make five separate reads. This would result in potentially 36 or more separate reads for the query above. In practice, this does not happen because an optimized server would use something like the DataLoader library that acts as a middleware to intercept those 36 calls and batch them into probably only four calls to the database. These smaller atomic read requests are needed so that the DataLoader (or similar tool) can efficiently batch them into fewer reads.

So, to optimize a GraphQL API with SQL, it is usually best to have small resolvers at the lowest levels and use something like DataLoader to optimize them. But for a DynamoDB API it is better to have “smarter” resolvers higher up that better match the access patterns your single table database it written for. The over-fetching that results in this case is usually the lesser of the two evils.

Deploy this example in 60 seconds

This is where you realize the full payoff of using DynamoDB together with serverless GraphQL. I built this example with Architect. It is an open-source tool to build serverless apps on AWS without most of the headaches of directly using AWS. Once you clone the repo and run npm install, you can launch the app for local development (including a built-in local version of the database) with a single command. Not only that, you can also deploy it straight to production infrastructure (including DynamoDB) on AWS with a single command when you are ready.


The post How to Make GraphQL and DynamoDB Play Nicely Together appeared first on CSS-Tricks.

You can support CSS-Tricks by being an MVP Supporter.

Amazon DynamoDB Integration

This article helps any new developer who wants to connect to Amazon DynamoDB.

Prerequisites:

  • Developer should have working IDE
  • The developer should have an AWS account and have DynamoDb service added.
  • A sample DynamoDb Table to be created.

Configuration Needed:

In your class-path, you should have awsconfiguration.json created.  For android, these files should reside in the project folder -> res/raw folder.

Decision Making: Relational or NoSQL

For some time, products have been leaning towards NoSQL databases because of the number of advantages that they provide compared to Relational databases (RDBMS), especially in today’s distributed systemsThere is always a pressure of delivering things faster to make it live to end users. But does that mean relational databases won’t be able to compete with NoSQL databases, provided Relational Databases are still best-known for adhering to the ACID property?

Here, I am going to explore the possibilities from Relational Databases perspective and how they come a long way to compete with NoSQL Databases. I am going to compare two managed databases from AWS, DynamoDB and Aurora, to see if it's really worth giving credit to one type of database over another.

Enhanced DynamoDB Client — Java Abstraction Code

DynamoDb has introduced an enhanced client in 2020 that comes bundled with AWS SDK 2.0. This client is now the suggested way forward to execute database operations on DynamoDB using application classes.

In my recent project, we have had few scenarios to build against —

Microservice: Async Rest Client to DynamoDB using Spring Boot

Overview

Starting from Spring framework 5.0 and Spring Boot 2.0, the framework provides support for asynchronous programming, so does AWS SDK starting with 2.0 version.

In this post, I will be exploring using asynchronous DynamoDB API and Spring Webflux by building a simple reactive REST application. Let's say we need to handle HTTP requests for retrieving or storing some Event (id:string, body: string). The event will be stored in DynamoDB.

DynamoDB Global Tables

DynamoDB Global Tables

In this article, we will create a DynamoDB table, make it global, and test it. Global Table is a powerful feature but simple and easy to use.

Global Table helps the customers to deploy a multi-region, multi-master database and takes care of all necessary tasks to create identical tables in these Regions and propagate ongoing data changes to all of them.

Scanning and Creating a Table in DynamoDB Using MuleSoft Connector

Creating a table in DynamoDB

Amazon DynamoDB is a fully managed NoSQL database service. While doing a recent project, I found that there was very little documentation about using the Mulesoft DynamoDB connector. In this article, I will share my experience of using the Mule DynamdDB connector so that it can help others. Please feel free to add your comments.

You might also enjoy: Working With DynamoDB

The Anypoint Connector for Amazon DynamoDB provides connectivity to the Amazon DynamoDB API, enabling you to interact with Amazon DynamoDB to create a database table that can store and retrieve any amount of data, serve any level of request traffic, and automatically spread the data and traffic for the table over a sufficient number of servers to handle the request capacity and the amount of data stored, while maintaining consistent and fast performance.

Custom DynamoDB Docker Instance

Hey guys, I hope you all are doing well. I am back with another article on custom docker instances for databases. In my last post, we saw how we could have our custom docker instance for MySQL. Similarly, in this post, we will see how we can do the same with DynamoDB, so let's get started.

Just like the scenario in the previous article, I was working on a project with DynamoDB as the database due to its many features like scalability, cloud storage, etc. And I wanted to test some things and did not want to mess with the cloud instance, so I thought to make an instance of my own, so what to do?

AWS Resources That Should Be Backed Up

As many organizations have discovered first-hand, the consequences of data loss can be downright devastating, often resulting in prolonged downtime, significant damage to credibility, and major financial losses, both direct and indirect. While Amazon AWS has been heralded as a safer, more resilient alternative to on-premise computing, organizations must still think about how they can protect their AWS resources against loss by implementing a sound backup strategy.

Selecting AWS Resources for Backup

According to Amazon, AWS resources are all entities that an organization can work with, including EC2 instances, S3 buckets, and CloudFormation stacks. All AWS resources utilize a pay-as-you-go approach for pricing that’s similar to how utility companies charge for natural gas, water, and electricity.