Why Do We Need Databases and SQL?

SQL has a long and proven history. It has survived the fuss around NoSQL. Even if it is not perfect, it has been demonstrated to be the best available language for data. This is no surprise! The story began in the 1960s with the development of databases—an era marked by the introduction of the Integrated Data Store (IDS) at General Electric. However, it was Edgar Codd’s relational model that revolutionized data handling. His model, which turned data into a series of tables (or, more strictly, relations), has influenced database systems ever since. This era also saw the birth of SQL (Structured Query Language), which became the standard language for interacting with relational databases, including MariaDB and others.

The Utility of Relational Database Systems

So, why do we need all this database stuff? Let’s imagine you’re building an app, maybe a simple to-do list to keep track of your daily tasks. Initially, you might think, “Why not just save each task directly to a file?” After all, my programming language has constructs and libraries to save and read data from disk. Also, implementing this seems straightforward: create a task, write it to a file, delete a task, and remove it from the file. These are good points; however, as your app gains traction, users start to aggregate, and suddenly, you have thousands of users trying to add, delete, and modify tasks simultaneously. At this point, the simplicity of files becomes fragile. Imagine one user is updating a task at the exact moment another tries to delete it. Or maybe two users are editing the same task at the same time. With a simple file system, you’re likely to end up with corrupted or lost data because there’s no inherent mechanism to handle such conflicts.

10 Websites to Learn Database and SQL in Depth

Hello Devs, SQL is one of the most essential skills in today's increasingly data-driven world. It's vital for any programmer like a Java, C++, Python, JavaScript, or Ruby developer and people learning Data Science and Data analysis.

You may not believe but in my 20 years of career as a Java developer, I found that almost 95% of Java applications use a relational database in their back-end, and nearly all web applications use the database. 

Efficiently Creating and Managing Views in SQL

As a developer, have you ever faced challenges finding a specific piece of information in a large set of SQL code? Or have you repeatedly created the same query for different reports? These are common problems that developers often encounter when working with SQL.

SQL views solve these issues by enabling developers to simplify intricate queries and create reusable templates for frequently used queries. However, creating and managing views can be difficult, particularly for beginners.

A Guide To Multithreading in SQL

Are you tired of staring at your screen, waiting for your SQL queries to finish running? Delayed query time is a common problem among database administrators and developers, but it doesn't have to be that way. Optimizing performance is crucial to the smooth functioning of any application, and multithreading can be a game-changer. Imagine being able to speed up your database performance in a snap. One powerful tool at your disposal is multithreading, which allows our database to execute multiple tasks concurrently and can significantly improve the speed and efficiency of our database.

In this article, we'll dive deep into the world of multithreading in SQL, exploring various ways to implement it and the benefits it brings. We'll guide you through implementing and optimizing multithreading. We will provide you with a couple of examples and code snippets. For the more advanced users, we'll cover hot topics such as synchronization, parallel processing, and multithreaded transactions.

GraphQL, JavaScript, Preprocessor, SQL, and More in Manifold

We reached the final installment of our Manifold series but not the end of its remarkable capabilities. Throughout this series, we have delved into various aspects of Manifold, highlighting its unique features and showcasing how it enhances Java development. In this article, we will cover some of the remaining features of Manifold, including its support for GraphQL, integration with JavaScript, and the utilization of a preprocessor. By summarizing these features and reflecting on the knowledge gained throughout the series, I hope to demonstrate the power and versatility of Manifold.

Expanding Horizons With GraphQL Support

GraphQL, a relatively young technology, has emerged as an alternative to REST APIs. It introduced a specification for requesting and manipulating data between client and server, offering an arguably more efficient and streamlined approach. However, GraphQL can pose challenges for static languages like Java.

MaxScale for the Rest of Us, Part 3: Install and Configure MaxScale

This third post in this series of blogs about MaxScale is finally getting where you want to go: Install and configure MaxScale. The first blog in this series was an overview of what MaxScale is and the second about how to set up a Cluster of MariaDB servers, using MariaDB Replication, for MaxScale to access. But now it's time to introduce MaxScale.

If you skipped the second post as you already know how to set up MariaDB with Replication and all that, be remineded that I will use the same Linux server setup as outlined there even for the MaxScale server and for a client to do some testing, and I recommend you stick with that for now (for MariaDB itself you can use any relevant setup you want, MaxSCale doesn't really care, but MaxScale is pretty new and has still not been tested on that many platforms, so try to stick to the CentOS 6.5 setup I propose.

Why You Should Consider Graphs For Your Next GraphQL Project

This article is a sponsored by Neo4j

The explosion of GraphQL over the past few years has introduced many front-end developers to the concepts of data modeling and storage, turning front-end developers into full-stack developers.

GraphQL provides developers working on a simple contract with a database, guaranteeing consistency and predictability of the data returned while also managing persistence and data fetching. The developer trusts the API to store and retrieve the data most efficiently.

But convenience comes at a cost. One day, your side project hits the front page of Hacker News, and a sudden influx of users grinds your database to a halt. Sometimes, the remedy is as simple as using the right underlying database for the loads.

In this article, I will look at the Graph behind GraphQL and demonstrate why Neo4j is the best fit for your next project.

The Graph In GraphQL

GraphQL itself is a database-agnostic query language. Many database companies and startups now offer libraries that convert a GraphQL query or mutation into a query language that works with the underlying data store, whether that be SQL for relational databases, Cypher for graph databases, or any number of proprietary query languages.

Graphs provide a natural way to represent data, where Nodes (or vertices) that represent entities or things are connected together by Relationships (or edges). Depending on the underlying data storage in your GraphQL library of choice, a certain amount of gymnastics may be involved. Suddenly, unnatural tables with strange names are created, or data is duplicated to improve query response times, introducing technical debt along the way.

This is where Neo4j comes in. Neo4j is a native Graph Database. Graph Databases are in a category all of their own, and for a good reason.

Graph databases treat the connections between data as first-class citizens, storing relationships so that highly connected datasets can be queried in real-time.

An Example: Movie Recommendations

Say we’re sick of scrolling through an endless list of thumbnails on our favorite streaming platform looking for something to watch. We decided to build a new website where users can register, provide movie ratings, and in return, receive movie recommendations based on users who have similar ratings.

In our GraphQL schema, we define types that represent movie information. Users provide movie ratings, each with a score between 1 and 5. Movies can have one or more actors and one or more directors. Movies are also tagged with one or more genres.

Note: Luckily, this Recommendations dataset already exists as a free Neo4j Sandbox. Neo4j Sandbox instances are free of charge, initially run for three days, and can be extended up to 10 days.

type User {
  userId: ID!
  name: string
  email: string
  ratings: [Rating]
}

type Rating {
  user: User!
  movie: Movie!
  rating: Int
  createdAt: Date
}
type Movie {
  movieId: ID!
  title: String
  released: Date
  actors: [Role]
  directors: [Person]
}

type Role {
  person: Person!
  movie: Movie!
  roles: [String]
}
type Person {
  personId: ID!
  name: String!
  born: Date!
  roles: [Role]
  directed: [Movie]
}

Let’s take a look at how this data will be stored in a relational database, a document store, and a graph and see where we might hit a problem when trying to generate recommendations.

In A Relational Database

Relational databases provide a structured method of data storage where data is organized into tables. Tables conform to strict rules known as a database schema, where each row contains a set number of columns, each with a set data type. Where a value may not exist, nullable columns can be used.

The underlying database schema provides a perfect base to map GraphQL Type Definitions. Each field within a type description will map one-to-one with a column. Those Type Definitions can be quickly translated into an SQL query (SQL stands for Structured Query Language) to insert or retrieve data.

A JOIN is constructed at read-time for nested types, joining two tables using foreign keys to find the corresponding records in a database. Here comes the first potential problem.

Let’s look at an Entity Relationship Diagram (ERD) that describes how the data may be stored in a relational database.

The tables highlighted in yellow represent the main entities in the data model: users, people, and movies. The tables highlighted in green represent the JOIN tables required to facilitate the many-to-many relationships between the entities.

There are two potential pitfalls here. First, let’s talk about naming. The example above is fairly straightforward, but say we have a many-to-many relationship in our data model between Products and Orders — an order may contain one or more products, and a product may appear in many orders. What do we call that table? order_products, order_line? This feels unnatural, and instantly you are adding tribal knowledge to the database, making it harder for others to understand.

When you use that table to find an actor for a particular movie, you start to hit the O(n) problem.

JOINs & The O(n) Problem

GraphQL is designed to be a flexible query language that allows you to retrieve an infinite level of nested values. The more nested items retrieved, the more joins are queried. Therefore the longer the query takes. Furthermore, the more data added to the database, the larger the underlying indexes become and the longer the query will take to return a result.

This is known as the Big O notation or O(n) notation — the number of computational resources required to compute the JOINs is relative to the size of the input data. The more data added to the database, the more data needs to be processed, and the slower the database will become.

Many relational databases support subqueries or window functions, but these must still be constructed in memory at query time, which can be an expensive operation.

This problem can be partially resolved by database tuning, partitioning, or denormalizing data to improve response times, at which point you’ll need to become a database expert.

In A Document Store

Document stores, such as MongoDB or CouchDB, differ from Relational databases in that they are designed to store unstructured or semi-structured data. Data is organized into collections, each of which consists of many documents. Each document in a collection represents a single record, which can have its own unique set of key-value pairs. This approach is more flexible than relational databases, but as the data is schema-less, you must be careful to enforce consistency through your application layer.

You would most likely create collections to store users, movies, and people.

Data Duplication for Query Performance

Document Stores can also fall foul of the O(n) problem. NoSQL databases, in general, are all designed to provide various their own solutions to the problems of read and write performance.

A common approach to solve the O(n) problem is to duplicate data across collections to speed up query responses. For example, the movies collection may store directors as an array of string values.

{
  "_id": ObjectId("63da26bc2e002491266b6205"),
  "title": "Toy Story",
  "released": "1996-03-22",
  "directors": ["Tom Lasseter"]
}

This is perfect if you only want to display the data within a UI. But if we need to ask more complex questions, for example, how many movies has Tom Lasseter directed? — things start to get complicated. Do we loop through every movie record and check the directors array for a name? What if two directors share the same name?

If you want to query across collections, you would usually store a reference to the unique ID of the record in the corresponding collection. Take the user example below: the ratings for that user can be stored as an array against the user document, making it easy to access. Each rating contains a reference (in this case, a MongoDB DBRef to reference the ObjectId of the document in the movies collection).

{
 "_id": ObjectId("63da267a89f7381acf7ab183"),
 "email": "john.doe@example.com",
 "name": "John Doe",
 "ratings": [
   {
     "movie": {
       "$ref": "movies",
       "$id": ObjectId("63da2681680f57e194eb3199"),
       "$db": "neoflix"
     },
     "rating": 5
   },
   {
     "movie": {
       "$ref": "movies",
       "$id": ObjectId("63da26b613fe29cf79d92e2f"),
       "$db": "neoflix"
     },
     "rating": 3
   },
 ]
}

Document stores support pipelines or map-reduce functions that allow you to compute the JOIN at read time. But these can become unwieldy quickly and hard to reason about, and take time to compute. These read-time JOINs also fall victim to the O(n) problem. Each reference must be looked up in an index to find the corresponding record, which must also be decoded. The larger the collection, the larger the index and the longer each lookup may take. Multiply that time and complexity by the number of nested items, and all of a sudden, we’ve got a slow and complicated pipeline or map/reduce function.

To avoid this complexity, you could also store some of the required properties for the movie, for example, the movie title, in the rating object.

You may also store the movie title as a key in the rating to balance out the ease of readability and data duplication. But now we also have to make difficult decisions on what data to duplicate to speed up. If the use case changes in any way, a mountain of work is required to fit the new use case. What if we want to query from movie to rating?

You may also want to fan out your writes, duplicating data across collections to speed up the read-time performance, but that also comes with its own maintenance headaches and a whole load of potential for technical debt.

The Case for Graphs

Now, let’s look at this data as a graph. The data structure of Nodes and Relationships fits this problem well. Rather than creating JOIN tables to handle many-to-many relationships or storing duplicated data for reference, the verbs in the use case are stored as relationships:

More Natural Modeling

The data model above is easier to understand and reason about. At a quick glance, you can see that a User may have one or more REVIEWED relationships pointing to a Movie node. Nodes and relationships can both contain properties stored as key-value pairs. We can use this to store the rating and createdAt properties of the review directly on the relationship.

Constant Query Times

Remember how I mentioned earlier that relationships are treated as first-class citizens? When a relationship is created in Neo4j, a pointer is appended to the node at each end of the relationship, ensuring that every node is aware of every relationship going out from or coming into it.

This enables the query engine to quickly lookup relationships without relying on an index. This ensures that query response times remain constant to the amount of the graph touched during the query rather than the data size overall.

Querying a Neo4j graph is also different from relational databases and document stores. Neo4j uses a proprietary language called Cypher. Cypher is similar in structure to SQL, but instead of starting with a SELECT statement and using JOINs to combine data, a Cypher statement begins with a MATCH clause, which defines a pattern of data to return.

Neo4j will then parse the query, examine the database schema and use database statistics to determine the most efficient way to traverse the pattern. Regardless of the way the pattern is written, the query will be executed in the same way.

Let’s look at the SQL and Cypher statements required to retrieve the data side by side. Both queries will find the names of actors from the movie The Matrix.

SQL Cypher
SELECT p.name, p.born, r.roles, m.title
FROM people p
INNER JOIN roles r on p.id = r.person_id
INNER JOIN movies m on r.movie_id = m.id
WHERE m.title = ‘The Matrix’
MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
WHERE m.title = ‘The Matrix’
RETURN p.name, p.born, r.roles

In a Cypher statement, you use an ASCII-art style syntax to draw the pattern you would like to read from the graph. Nodes are surrounded by parentheses ( ( and ) ), and relationships are drawn using dashes and an arrow to represent the direction. This declarative approach differs from a Pipeline in MongoDB, where you must express exactly how the data should be retrieved.

This is only a trivial example, but the more complex the use case becomes, the more a Cypher statement comes into its own. I have shown Cypher statements to business owners, architects, and even C-level executives, who have all quickly understood what the statement is doing, which cannot be said for an SQL statement and certainly cannot be said for a pipeline.

Suddenly the barrier to data engineering doesn’t seem so high.

Conclusion

My mantra has always been to use the best tool for the job, particularly when it comes to databases. You may feel that my opinion is a little biased, as I am literally paid to have this opinion. But since first installing Neo4j around a decade ago, I’ve started to see the value of connections everywhere.

A simple network of nodes and relationships is surprisingly powerful when storing data. Graph databases allow you to avoid much additional work to model your use case to work with a database and naturally handle performance and scale.

If you would like to learn more about Neo4j, you can check Neo4j GraphAcademy, where we have constructed Beginners courses that will give you the confidence to import and query data in Neo4j, and the Developer courses teach you will show you how to connect to Neo4j using one of the five official drivers: Java, JavaScript, Python, .NET, and Go.

You can create an AuraDB Free instance pre-populated with data, which will hold 200k nodes and 400k relationships, and it’s free for as long as you need.

So, if you are working with a complex, highly connected dataset or would like to futureproof your project against complicated database migrations and refactoring in the future, why not put the Graph into GraphQL?

How to Improve the Performance of Large WordPress Sites

If you’re running a larger WordPress site, you may be facing performance challenges that can impact user experience and search engine rankings. Fortunately, there are several steps you can take to improve your site’s speed and performance. This article covers some of the most common performance challenges for larger WordPress sites and provides tips and solutions for addressing these.

Growing pains… some should be so lucky to have them!

Let’s talk about how to manage a WordPress site that is growing too quickly too soon and causing you or your clients all sorts of pains and problems.

If the issue is temporary, such as dealing with an unexpected traffic spike from a post gone viral (another thing we should be so lucky to experience!), then knowing how to scale your WordPress site when traffic soars can fix this.

However, if problems persist, it may take more than a couple of aspirins and calling the doctor in the morning to make the headaches go away.

In this article, we’ll cover:

WordPress Enterprise Development Challenges

Q: How complex can you make a WordPress site?

A: Very.

When it comes to building large and complex sites, WordPress’s capacity to handle it is not an issue. As WordPress enterprise developer and global SME business adviser Mario Peshev states in his excellent article on building large and complex sites using WordPress

“WordPress is a proven CMS that handles various applications handling millions of users and tens or even 100M views a month.”

As Mario also states…

“Scaling from 10M to 50M is feasible, 50M to 100M is challenging, 100M–200M is quite complex and 200M+ may require some serious engineering effort.”

So, the capacity of the WordPress CMS platform to handle large and complex sites is not a problem.

The issue is having the skills to handle WordPress enterprise development challenges.

As most developers know, WordPress is not only a widely popular content management system known for its flexibility, ease of use, and affordability, but it is also an excellent platform for small businesses and startups that want to establish a web presence quickly and easily.

However, when it comes to enterprise-grade WordPress development, the amount of information available is as scarce as a developer who hasn’t resorted to cursing loudly at their code editor at least once.

So, before we get into diagnosing the challenges and issues of dealing with large WordPress sites, let’s explore some of the challenges of finding relevant information on WordPress enterprise development.

Here is a summary of the points Mario Peshev makes in his article…

Scarcity of Information on Enterprise-Grade WordPress Development

One of the main reasons why information on enterprise-grade WordPress development is scarce is that only a handful of agencies specialize in building WordPress platforms, applications, plugins, or performing migrations and integrations for the enterprise.

Most vendors profile in small business websites, and only a small chunk of the service providers work with enterprises.

Furthermore, those consultants and agencies often don’t have the time and resources to write tutorials and share their know-how with the industry, or they just don’t care, especially more hardcore engineers who don’t want to bother.

Another reason why information on WordPress enterprise development is limited is that WordPress is often not the core application that enterprises use in the first place. It’s another obstacle for many, like working on the front-end interface as 1% of the main platform running behind the scenes.

However, WordPress developers who want to bid on enterprise projects can focus on several different areas to enhance their expertise.

Focus on Different Areas for Enhancing Expertise

The first area that WordPress developers should focus on is studying the WordPress Core, APIs, and the surrounding ecosystem in-depth. This will give developers a deeper understanding of the platform and how it works.

They should also make sure that they’re comfortable with WordPress coding standards and best practices. This will ensure that the code they write is maintainable and easy to read.

The second area that WordPress developers should focus on is practicing in the main technical areas that enterprises care about, such as performance, security, scalability, and backward compatibility.

Enterprises have high expectations, and it’s essential to demonstrate that you have the expertise to meet their requirements.

These WordPress development resources will help you gain these valuable skills and expertise:

Strategic Players in the Field

Hosting vendors are strategic players in the field and occasionally work with high-scale applications. Developers can browse their resources and follow their blogs, knowledge base articles, and the like. WordPress is a platform built on top of PHP and SQL, front-end served through HTML, CSS, JavaScript. It runs on a web server like Apache or Nginx using mod_php or php-fpm, connected to a MySQL database on a Linux server.

Most of the heavy lifting for enterprises happens on top of those layers. Therefore, it makes sense to dive deeper into their own communities and resources stressing on those topics.

Follow WordPress Core Contributors and Employees

It always helps to follow WordPress Core contributors, employees at enterprise-grade companies, and the blogs of the leading agencies working with enterprises. You may find some relevant case studies, interviews with clients, or other top engineers that could help you improve even further.

Now that we’ve looked at the first challenges, which is acquiring the expertise to handle large and complex WordPress sites and meeting the expectations of enterprises, let’s turn to addressing common performance issues you may experience working with large WordPress sites.

Common Performance Challenges for Large WordPress Sites

WordPress is used by some of the biggest and most well-known companies, celebrities, and brands in the world, like Intel, Pepsi Cola, PlayStation, American Express, TechCrunch, Fisher-Price, Beyonce, Justin Timberlake, Usain Bolt, and many more.

Someone has to look after these large sites… why not you?

While browsing through WPMU DEV’s member forums (which, by the way, is a treasure trove of information for web developers), I came across this post from WPMU DEV member Charly Leetham, which I am reproducing in full below:

***

I was contacted by a long term client asking for assistance with their client.

The end customer is setting up a rather large website in WordPress and they were having no end of difficulties in keeping the site running. It was so bad, that they had to reboot their Amazon EC2 instance regularly (several times a day regularly).

With trepidation I agreed to take a look and see if I could help. What I found has left me … saddened. For the client, mostly.

The site:

  • Database: 4Gigabytes (after optimization)
  • Posts / Pages and other content: Over 900K entries.

This is not a small site.

It was built in Elementor which initially left me concerned, as I know that Elementor is resource hungry.

The EC2 instance was provisioned with 140 Gig storage and 32 Gig memory. More than enough, right? One would think so.

The business had been moved to EC2 by a consultant who had promised them it would improve their performance. Then they told them that the reason the instance kept hanging was because of the high number of transients that were being created.

They created a cron job that deleted the transients every hour and with very little improvement.

I’ve found a number of things during my investigations but the three most concerning things are:

1. Although the server was provisioned with 32G of memory. PHP had been limited 2G and WordPress had been limited to 40M.

It’s no wonder they were having trouble.

Increasing these limits has stopped the hanging but we’re still experiencing memory overflows.

2. The database was provisioned on the same server.

Splitting the database onto a RDS (remote database server) should provide more performance increases.

3. No optimization or performance improvement work had been done.

By implementing Hummingbird, I’ve been able to improve the load time of the site and that’s without doing anything really hard core. That’s still to come.

The main thing I want to highlight for others here, is that it’s the incremental knowledge you bring to the table when working with clients.

Yes, people can build their own WordPress sites but few people can really make them hum. That takes experience and a lot of work.

***

Charly’s forum post is a great example of some of the typical performance challenges you can expect when working with larger WordPress sites and provides a number of useful insights into handling these.

To address these challenges, let’s first summarize the main technical issues Charly described when looking at this client’s site:

  1. The end customer is setting up a rather large website in WordPress with over 900k entries and a 4GB database after optimization, which is not a small site.
  2. The website was built in Elementor, which is resource-hungry and requires a lot of server resources.
  3. The EC2 instance was provisioned with 140GB storage and 32GB memory, but PHP had been limited to 2GB and WordPress had been limited to 40MB, causing performance issues and memory overflows.
  4. The database was provisioned on the same server, which caused performance issues. Splitting it onto a remote database server should provide performance improvements.
  5. No optimization or performance improvement work had been done. By implementing Hummingbird, Charly was able to improve the site’s load time.
  6. The incremental knowledge and experience brought to the table by an experienced web developer is crucial for optimizing and improving the performance of WordPress sites, which can be complex and require a lot of work to make them run smoothly.

We’ve already addressed point #6, so let’s go through the other issues on the list above.

Large WordPress Site Performance Issue #1 – WordPress Database

As your WordPress site grows, so does the size of its database. Your WordPress database can become quite large and may start causing some issues.

Managing a large WordPress database can be a daunting task, so let’s take a look at some of the challenges, best practices, strategies, and solutions for managing your WordPress database on larger sites.

The challenges of having a large WordPress database include:

  • Slow page load times: A large database can slow down your website, making it difficult for visitors to load pages quickly.
  • Backup and restore issues: Backing up and restoring a large database can be a challenge, and it may take a long time to complete the process.
  • Database corruption: A large database can be more prone to corruption, which can cause data loss and other issues.
  • Difficulty in database maintenance: Maintaining a large database may require more resources and expertise to keep it running smoothly.

Here are some strategies and best practices managing WordPress databases on larger sites:

Initial Configuration

Before you even start thinking about managing your database, it’s important to make sure that it’s set up correctly. When you install WordPress, it creates a new database for you. However, if you’re running a large site, you may want to consider using a separate database server. This will help to improve performance and reduce the load on your web server.

When configuring your database, it’s important to choose the right settings. In particular, you’ll want to pay attention to the database character set and collation. These settings can affect how your content is displayed on your site, so it’s important to get them right from the start.

Where to Keep the Databases

When managing a large WordPress site, you’ll want to think carefully about where to keep your databases.

There are a few different options to consider:

  • Local Database: You can keep your database on the same server as your website. This is the simplest and most common option, but it can lead to performance issues as your site grows, as Charly referred to in the client example above.
  • Remote Database: You can keep your database on a separate server, either within your own network or in the cloud. This can improve performance, but it can also increase costs.
  • Managed Database: You can use a managed database service, such as Amazon RDS or Google Cloud SQL. This can be a good option if you don’t have the expertise to manage your own database.

Database Access Time with Large Numbers of Records

As your WordPress site grows, the size of your database can have an impact on how quickly your site loads.

When you have a large number of records in your database, queries can take longer to run, which can slow down your site.

Caching can help speed up your website by storing frequently accessed data in memory, reducing the need to access the site’s database and PHP. This, of course, depends on the kind of caching being used, e.g. database caching (which includes object caching) or page caching (where the cache of the web pages is stored and presented when a specific page is requested later without needing to be processed by PHP and MySQL).

To improve performance, you can use server-side caching, caching plugins that manage server-side caching solutions, or standalone caching plugins. Our performance-optimizing plugin Hummingbird, for example, has its own caching but also integrates with WPMU DEV’s server-side caching.

Caching can have a significant impact on site performance, particularly for larger sites. However, setting up and managing caching can be complex and time-consuming.

Also, it’s important to regularly monitor your site’s performance to ensure the caching is optimized for your specific needs.

To learn more about caching solutions, check out our Ultimate Guide to WordPress Caching.

Another option is to use a technique called “sharding,” which involves splitting your database into smaller pieces. This can help to improve performance by spreading the load across multiple servers.

Techniques for Splitting the Data Up

If you’re using a technique like sharding, you’ll need to decide how to split your data up. One option is to split your data by category or tag. For example, you could have one database for posts related to technology, and another for posts related to entertainment.

Another option is to split your data by date. This can be particularly useful if you have a lot of older content that doesn’t change very often. You could have one database for posts from the last year, and another for older posts.

Consider also using a plugin like HyperDB. HyperDB is maintained by Automattic, the parent company of WordPress.

As described on the plugin page…

HyperDB allows tables to be placed in arbitrary databases. It can use callbacks you write to compute the appropriate database for a given query. Thus you can partition your site’s data according to your own scheme and configure HyperDB accordingly.

Basic Indexing

Indexing your database can help to improve performance by making it faster to search for data. When you create an index, the database creates a data structure that makes it easier to search for specific values.

To create an index, you’ll need to use the MySQL command line or a tool like phpMyAdmin.

When you’re creating an index, it’s important to choose the right columns to index. Typically, you’ll want to index columns that are frequently used in queries.

You can also use a plugin like Index WP MySQL for Speed. This plugin adds database keys (also called indexes) to your MySQL tables to make it easier for WordPress to find the information it needs. The plugin page also includes excellent information on database indexing in relational database management systems.

Settings and Logs to Check

To keep your database running smoothly, there are a few settings and logs that you’ll want to keep an eye on. These include:

  • MySQL slow query log: This log records queries that take longer than a certain amount of time to run. By analyzing this log, you can identify queries that are causing performance issues.
  • MySQL error log: This log records any errors that occur in the MySQL server. By monitoring this log, you can identify and troubleshoot issues that may be affecting your database.
  • WordPress debug log: This log records any errors or warnings that occur within WordPress. By monitoring this log, you can identify issues with your WordPress installation or plugins.
  • Database backups: Regularly backing up your database is important to ensure that you don’t lose any data in case of a server crash or other disaster and can restore your website quickly in case of a problem. You can use a plugin like Snapshot to automate this process, or if you’re hosting with WPMU DEV, you can configure automatic enterprise database backups to perform daily and even hourly. Also, consider storing all backups separately from the server hosting the site, as the backups may be lost if the server crashes.

Other Ongoing Maintenance

In addition to the above, there are a few other ongoing maintenance tasks that you’ll want to perform to keep your database running smoothly.

These include:

  • Cleaning up your database: Over time, your database can become cluttered with unused data. Check our article on how to clean up your database and remove unnecessary data for more details.
  • Optimizing your database tables: Reducing the size of your database and optimizing your database tables helps to improve site performance. You can optimize your database by removing unnecessary data, such as post revisions, trashed items, spam comments, and unused plugins and themes. Check our complete WordPress database optimization guide for detailed instructions and plugins that help you do this.
  • Monitoring your site for security issues: Large sites are often a target for hackers. You can use a plugin like Defender to monitor your site for security issues and prevent attacks.

In terms of cleaning up your database, Charly mentions a high number of transients as being a possible issue affecting the site’s performance. Although addressing this issue seemed to offer very little improvement in Charly’s client’s case, it’s worth mentioning it here as something to check if you are experiencing issues with your site.

Transients are a type of cache that stores data in the database for a specific period of time. They are used to speed up the loading time of a website by storing the results of a complex or time-consuming query, such as an API request, so that the query doesn’t have to be run every time the page is loaded.

Transients have a set expiration time, after which they are automatically deleted from the database. However, if the website is not properly optimized, transients can accumulate in the database and cause performance issues, such as slow page loading times or database crashes.

To optimize WordPress and avoid issues with transients, there are several steps that can be taken. These include:

  • Use a caching plugin: A caching plugin like Hummingbird can help reduce the number of database queries and prevent unnecessary creation of transients.
  • Delete expired transients: Expired transients can accumulate in the database, so it’s important to regularly delete them to keep the database optimized. This can be done manually, or by using a plugin like Hummingbird.
  • Set a maximum lifetime for transients: By setting a maximum lifetime for transients, you can prevent them from being stored in the database for too long, which can lead to performance issues. This can be done using the set_transient() function in WordPress.
  • Use a remote database: Storing the database on a remote server can help reduce the load on the server and prevent issues with transients.
  • Increase the memory limit: Increasing the memory limit for PHP and WordPress can help prevent memory overflows and performance issues caused by transients.

No matter what size WordPress site you are working on, using WPMU DEV’s Hummingbird caching and site optimization plugin can help to automatically take care of expired transients and eliminate this issue, leading to faster page loading times and a smoother user experience.

Hummingbird: Advanced Tools screen with Database Cleanup and Transients options highlighted.
Hummingbird can be configured to automatically delete expired transients from your WordPress database.

In terms of increasing the memory limit for PHP, if you are a WPMU DEV member, it’s really easy to check a whole bunch of information about your WordPress site, include current PHP memory limits and max filesize upload settings.

Just log into your WordPress dashboard and navigate to the WPMU DEV dashboard plugin menu. Select Support > System Information > PHP tab.

WPMU DEV Dashboard plugin - Support tab.
WPMU DEV’s Dashboard plugin lets you easily check information about your WordPress site.

If you are not a WPMU DEV member, you can still check this information manually.

To find out how much php memory is allocated, create a php and add the following:

<?php
phpinfo();
?>

Call it something like php-test.php and upload it to your server.

Access the file from a browser and search for memory_limit. This will give you two settings – what the local site settings are and what the server default is. It is possible to have different php memory_limits by site.

For WordPress memory, for instance, you might see the following:

define('WP_MEMORY_LIMIT', '64M');

Note that if this entry is missing in the wp-config.php file, then your site is probably working between 40M and 64M.

In addition to the above, make sure to also scan and fix corrupt or broken files and database in WordPress.

As you can see, there are quite a number of things you can do to improve the performance of your WordPress database.

Let’s move on, to…

Large WordPress Site Performance Issue #2 – WordPress Core, Themes, And Plugins

Charly mentions that another possible reason for the performance issues her client’s site was experiencing was using a resource-hungry theme.

Rather than focusing on a particular theme, let’s look at themes and plugins in general (btw… if you use Elementor, check out our article on how to optimize Elementor themes. We’ve also written articles on ways to optimize themes like Divi, WPBakery, Astra, and other page builders.)

Here are some of the things you can do:

Theme and Plugin Bloat – Themes and plugins can significantly impact the performance of a WordPress site, particularly if they are not optimized or updated regularly. Some themes and plugins can also be poorly coded, leading to slow loading times and site bloat.

Solution: Be sure to choose a lightweight and optimized theme that is regularly updated by the developer. Avoid using too many plugins and remove any unnecessary ones to reduce site bloat. Always keep your themes and plugins up-to-date to ensure optimal performance.

  • Avoid poorly coded themes and plugins, as these can lead to slow loading times, site bloat, and conflicts.
  • Choose lightweight and optimized themes and plugins that are regularly updated by their developer.
  • Check your server logs to identify heavy plugins and themes that could be slowing down your site.
  • Always keep your themes and plugins up-to-date to ensure optimal performance.
  • Deactivate and remove unnecessary and non-essential plugins and themes.

As with all WordPress sites, regardless of size, it’s also really important to optimize your client sites.

There are a number of tools you can use to scan your site and measure site performance, including Google PageSpeed Insights and GTmetrix. These tools provide important insights into ways to optimize your sites.

You can also use a developer tool plugin like Query Monitor to help you identify issues, aid in the debugging of database queries, PHP errors, hooks and actions, block editor blocks, enqueued scripts and stylesheets, and HTTP API calls. The plugin also provides advanced features such as debugging of Ajax calls, REST API calls, and user capability checks.

Query Monitor - WordPress plugin
Use Query Monitor to quickly identify poorly performing plugins, themes, or functions in your WordPress site.

Additional articles and tutorials that we recommend checking out include our guide on speeding up WordPress, solutions to forgotten WordPress page speed problems, WordPress troubleshooting guide, and Mario Peshev’s article on scaling mistakes when running a large WordPress site.

Large WordPress Site Performance Issue #3 – Site Content

Large WordPress sites typically have loads of content. In Charly’s case, for example, the client’s website had over 900k entries.

If you’ve gone and optimized the database and you’re still experiencing issues, here are some of the things you can look at:

  • Perform a content audit: A content audit is essentially performing an inventory of your existing content and assessing and identifying content that’s outdated, obsolete, duplicated, etc, before deciding what to do with it (e.g. update, SEO optimize, trash). It’s a long-term but effective and important strategy for keeping your site’s content manageable and maintained.
  • Use lazy loading: Lazy loading can help to ensure that media files are only loaded when they are needed, which can significantly improve page load times.
  • Use a content delivery network (CDN): Consider using a content delivery network (CDN) to distribute cached media files and reduce the load on your server. A CDN can help speed up your website by caching your website’s content on servers located around the world, reducing the load on your server. Popular CDNs include Cloudflare and MaxCDN. Note that all WPMU DEV membership and hosting plans include a CDN. Our Hummingbird and Smush plugins also include a CDN (Hummingbird also offers Cloudflare integration).
  • Use content optimization plugins: Optimize images, videos, and other media files by compressing them and reducing their file size. If the site contains loads of images, consider using an image optimization plugin like Smush, which significantly reduces image file sizes without compromising on image quality to improve content delivery performance. Smush also includes WPMU DEV’s CDN.
  • Use a managed WordPress hosting service: A managed WordPress hosting service can provide you with optimized servers and database management tools to help keep your website running smoothly. As discussed in the next section below, WPMU DEV not only offers a best-of-class managed WordPress hosting service, but it is also specifically configured to deliver enterprise-level hosting for WordPress sites of all kinds and sizes.

Large WordPress Site Performance Issue #4 – Hosting

If you are still experiencing problems with the site after fixing issues with the WordPress database and optimizing the site’s core, plugins, themes, and content, the issue may be related to web hosting.

Consider using a managed WordPress hosting service with a company that specializes in WordPress.

Hosting with a reputable host not only means placing your site in the care of an experienced team who will handle areas like server optimization and database management for you, but also migrate your existing website to their servers.

This is very important, as a large WordPress site no doubt has lots of moving parts and active traffic and transactional events taking place, and you don’t want to lose any valuable data or break anything during the migration process.

Additional hosting considerations for a large WordPress site include the ability to handle demands with ample resources, uptime, speed, and customer support.

WPMU DEV offers enterprise-level hosting, 24/7 expert hosting and WordPress support, and migrations by a team of experts who will handle everything for you, including troubleshooting any potential issues with your site.

Additionally, WPMU DEV has been independently rated and reviewed by many users as one of the leading managed WordPress hosting companies, with a near-perfect rating score. G2.com, for example, rates WPMU DEV 4.8 out of 5 stars overall, and 9.8 out of 10 for quality of support.

More importantly and on a practical level, our expert team proactively manages larger sites by regularly checking areas like “PHP error logs” for any errors in the plugins, themes or in the WordPress core and “PHP slow logs” for slow loading scripts (e.g. plugins where scripts exceed 30 seconds to execute), access logs (to see if there’s a DDoS attack or high visitors in general), and load on the server resources, including CPU, RAM, etc.

The team also checks if WAF is enabled, caching is ON, and any non-used profiling software is turned off when not needed, and will perform conflict tests for plugins and themes and run query monitoring scans at the mysql level when required.

We also offer integration with New Relic and Blackfire to profile the site and its pages for all sites, large and small.

Managing Larger WordPress Sites Is A Big Job

A large WordPress site differs from other WordPress sites mostly in the scale and complexity of its management.

Dealing with performance issues in large, complex WordPress sites requires having the skills and the expertise to handle challenges and meet the high expectations of enterprise clients.

Finding information on WordPress enterprise development can be challenging, but focusing on different areas like studying the WordPress Core, APIs, and the surrounding ecosystem, practicing in the main technical areas, and following leading agencies, will help you become more knowledgeable and confident in your abilities as a developer.

Also, managing a large WordPress database can be challenging but there are solutions available to help you manage it. By optimizing your database, using caching and CDN services, using a managed WordPress hosting service, and regularly backing up your database, you can ensure that your website runs smoothly and avoid potential issues.

By addressing common performance challenges and regularly monitoring your site’s performance to identify and address any issues as they arise, you can significantly improve the performance of your larger WordPress site.

Finally, hosting your site on enterprise-level servers with an experienced and reliable managed WordPress hosting partner like WPMU DEV will not only improve your large site’s performance but also help to eliminate problems and issues, as your site will be expertly managed and monitored 24/7.

If you are looking to migrate your existing site from another host or upgrade hosting for a large WordPress site, we recommend looking at our enterprise-level hosting plans (3 x Essential and 3 x Premium options), and taking advantage of our hosting buyout and free expert site migration service.

***

Ask Charly Leethan

Special thanks to WPMU DEV member Charly Leethan for her contribution to this post. AskCharlyLeethan provides ongoing support and advice to help small businesses define and refine their processes and plan and build their web presence using current and emerging technologies.

SQL Extensions for Time-Series Data in QuestDB

In this tutorial, you are going to learn about QuestDB SQL extensions which prove to be very useful with time-series data. Using some sample data sets, you will learn how designated timestamps work and how to use extended SQL syntax to write queries on time-series data.

Introduction

Traditionally, SQL has been used for relational databases and data warehouses. However, in recent years there has been an exponential increase in the amount of data that connected systems produce, which has brought about a need for new ways to store and analyze such information. For this reason, time-series analytics have proved critical for making sense of real-time market data in financial services, sensor data from IoT devices, and application metrics.

Avoid Data Silos in Presto in Meta: The Journey From Raptor to RaptorX

Raptor is a Presto connector (presto-raptor) used to power some critical interactive query workloads in Meta (previously Facebook). Though referred to in the ICDE 2019 paper Presto: SQL on Everything, it remains somewhat mysterious to many Presto users because there is no available documentation for this feature. This article will shed some light on the history of Raptor and why Meta eventually replaced it in favor of a new architecture based on local caching, namely RaptorX.

The Story of Raptor

Generally speaking, Presto, as a query engine, does not own storage. Instead, connectors were developed to query different external data sources. This framework is very flexible, but it is hard to offer low latency guarantees in disaggregated compute and storage architectures. Network and storage latency adds difficulty in avoiding variability. To address this limitation, Raptor was designed as a shared-nothing storage engine for Presto.

High Availability with MySQL Fabric: Part II

Originally written by and

This is the third post in our MySQL Fabric series. If you missed the previous two, we started with an overall introduction, and then a discussion of MySQL Fabric’s high-availability (HA) features. MySQL Fabric was RC when we started this series, but it went GA recently. You can read the press release here, and see this blog post from Oracle’s Mats Kindahl for more details. In our previous post, we showed a simple HA setup managed with MySQL Fabric, including some basic failure scenarios. Today, we’ll present a similar scenario from an application developer’s point of view, using the Python Connector for the examples. If you’re following the examples on these posts, you’ll notice that the UUID for servers will be changing. That’s because we rebuild the environment between runs. Symbolic names stay the same though. That said, here’s our usual 3 node setup:

Here’s how Bell was Hacked: SQL Injection Blow-by-Blow

OWASP’s number one risk in the Top 10 has featured prominently in a high-profile attack this time resulting in the leak of over 40,000 records from Bell in Canada. It was pretty self-evident from the original info leaked by the attackers that SQL injection had played a prominent role in the breach, but now we have some pretty conclusive evidence of it as well:

The usual fanfare quickly followed – announcements by the attackers, silence by the impacted company (at least for the first day), outrage by affected customers and the new normal for public breaches: I got the data loaded into Have I been pwned? and searchable as soon as I’d verified it.

Building a Data Warehouse, Part 5: Application Development Options

see also:

in part i we looked at the advantages of building a data warehouse independent of cubes/a bi system and in part ii we looked at how to architect a data warehouse’s table schema. in part iii, we looked at where to put the data warehouse tables. in part iv, we are going to look at how to populate those tables and keep them in sync with your oltp system. today, our last part in this series, we will take a quick look at the benefits of building the data warehouse before we need it for cubes and bi by exploring our reporting and other options.

10 More Common Mistakes Java Developers Make when Writing SQL

i was positively surprised to see how popular my recent listing about   10 common mistakes java developers make when writing sql   was, both  on my own blog  and  on my syndication partner dzone  . the popularity shows a couple of things:

anyway, the common mistakes i listed previously are far from complete, so i will treat you to a sequel of 10 subtly less common, yet equally interesting mistakes java developers make when writing sql.

Building Your Security Strategy (Case Study)

This article is a sponsored by Wix

What should you focus on when designing your security strategy? This question becomes more and more tricky as your organization grows and matures. At an initial stage, you might be able to make due with a periodic penetration test. But you will soon find that as you scale up to hundreds and thousands of services, some of the procedures have to change. The focus shifts from project-based assessments to building and maintaining a lasting mindset and framework with security at the core, so you can minimize risk across your environment.

In this article, we’ll share some guiding principles and ideas for incorporating security by design into your own development process, taken from our work at Wix serving 220M+ users.

First And Foremost: Security By Design

Also known as security by default, security by design (SbD) is a concept in which we aim to “limit the opportunities” for making security-related mistakes. Consider a case where a developer builds a service to query a database. If the developer is required (or allowed) to build queries “from scratch” writing SQL directly into his code, they can very well end up introducing SQL Injections (SQLI) vulnerabilities. However, with a security by default approach, the developer can get a safe Object-Relational Mapping (ORM), letting the code focus on logic where the DB interactions are left for the ORM libraries. By ensuring the ORM library is safe once, we are able to block SQLI everywhere (or at least everywhere the library is used). This approach might restrict some developer liberties, but except for specific cases, the security benefits tend to outweigh the cons.

That previous example is rather well known, and if you use a mature application development framework, you’re probably using an ORM anyway. But the same logic can be applied to other types of vulnerabilities and issues. Input validation? Do this by default using your app framework, according to the declared var type. What about Cross-Site Resource Forgery (CSRF)? Solve it for everyone in your API gateway server. Authorization confusion? Create a central identity resolution logic to be consumed by all other services.

By following this methodology, we’re able to allow our developers the freedom to move quickly and efficiently, without needing to introduce security as a “blocker” in later stages before new features go live.

1. Establish Secure Defaults For Your Services

Take the time to ensure that your services are served by default with secure settings. For example, users should not need to actively choose to make their data private. Instead, the default should be “private” and users can have the option to make it public if they choose to. This of course depends on product decisions as well, but the concept stands. Let’s look at an example. When you build a site on our platform, you can easily set up a content “Collection”, which is like a simplified database. By default, editing permissions to this collection are restricted to admin users only, and the user has the option to expose it to other user types using the Roles & Permissions feature. The default is secure.

2. Apply The Principle Of Least Privilege (PoLP)

Put simply, users shouldn’t have permission for stuff they don’t need. A permission granted is a permission used, or if not needed, then abused. Let’s look at a simple example: When using Wix, which is a secure system with support for multiple users, a website owner can use Roles & Permissions to add a contributor, say with a Blog Writer role, to their site. As derived from the name, you would expect this user to have permissions to write blogs. However, would this new contributor have permissions, for example, to edit payments? When you put it like this, it sounds almost ridiculous. But the “least permission” concept (PoLP) is often misunderstood. You need to apply it not only to users, but also to employees, and even to systems. This way even if you are vulnerable to something like CSRF and your employees are exploited, the damage is still limited.

In a rich microservice environment, thinking about least permission might become challenging. Which permission should Microservice A have? Should it be allowed to access Microservice B? The most straightforward way to tackle this question is simply starting with zero permissions. A newly launched service should have access to nothing. The developer, then, would have an easy, simple way to extend their service permission, according to need. For example, a “self service” solution for allowing developers to grant permissions for services to access non-sensitive databases makes sense. In such an environment, you can also look at sensitive permissions (say for a database holding PII data), and require a further control for granting permissions to them (for example, an OK from the data owner).

3. Embrace The Principle Of Defense In Depth (DiD)

As beautifully put by a colleague, security is like an onion — it’s made of many layers built on top of layers, and it can make you cry. In other words, when building a secure system, you need to account for different types of risk and threats, and subsequently you need to build different types of protections on top of others.

Again, let’s look at a simple example of a login system. The first security gateway you can think of in this context is the “user-password” combination. But as we all know, passwords can leak, so one should always add a second layer of defense: two-factor authentication (2FA), also known as multi-factor authentication (MFA). Wix encourages users to enable this feature for their account security. And by now, MFA is pretty standard — but is it enough? Can we assume that someone who successfully logged into the system is now trusted?

Unfortunately, not always. We looked until now at one type of attack (password stealing), and we provided another layer to protect against it, but there are certainly other attacks. For example, if we don’t protect ourselves, a Cross Site Scripting (XSS) attack can be used to hijack a user’s sessions (for example by stealing the cookies), which is as good as a login bypass. So we need to consider added layers of defense: cookie flags to prevent JS access (HTTP only), session timeouts, binding a session to a device, etc. And of course, we need to make sure we don’t expose XSS issues.

You can look at this concept in another way. When writing a feature, you should almost protect it “from scratch”, thinking all defenses might have been broken. That doesnt mean writing every line of code again, it just means being aware that certain assumptions cannot be made. For example, you can’t assume that just because your service does not have an externally reachable endpoint, it has never been accessed by malicious entities. An attacker exploiting Server-Side Request Forgery (SSRF) issues can hit your endpoint any minute. Is it protected against such issues?

At Wix, we assume a “breach mindset” at all times, meaning each developer assumes the controls leading up to the application they’re working on have already been breached. That means checking permissions, input validations and even logic — we never assume previous services are sensible.

4. Minimize Attack Surface Area

What’s the safest way to secure a server? Disconnect it from the electricity socket. Jokes aside, while we don’t want to turn our services off just to ensure they’re not abused, we certainly don’t want to leave them on if they serve no real function. If something is not needed or being used, it should not be online.

The most straightforward way to understand this concept is by looking at non-production environments (QA, staging, etc). While such environments are often needed internally during the development process, they have no business being exposed such that external users can access them. Being publicly available means they can serve as a target for an attack, as they are not “production ready” services (after all, they are in the testing phase). The probability for them to become vulnerable increases.

But this concept doesn’t apply only to whole environments. If your code contains unused or unnecessary methods, remove them before pushing to production. Otherwise, they become pains instead of assets.

5. Fail Securely

If something fails, it should do so securely. If that’s confusing, you’re not alone. Many developers overlook this principle or misunderstand it. Imagining every possible edge case on which your logic can fail is almost impossible, but it is something you need to plan for, and more often than not it’s another question of adopting the right mindset. If you assume there will be failures, then you’re more likely to include all possibilities.

For instance, a security check should have two possible outcomes: allow or deny. The credentials inputted are either correct, or they’re not. But what if the check fails entirely, say, because of an unexpected outage of electricity in the database server? Your code keeps running, but you get a “DB not found” error. Did you consider that?

In this particular instance, the answer is probably “yes”, you thought of it, either because your framework forced you to consider it (such as Java’s “checked exceptions”) or simply because it actually happens often enough that your code failed in the past. But what if it is something more subtle? What if, for example, your SQL query fails due to non-unicode characters that suddenly appeared as input? What if your S3 bucket suddenly had its permissions changed and now you can’t read from it anymore? What if the DNS server you’re using is down and suddenly instead of an NPM repo you’re hitting a compromised host?

These examples might seem ludacris to you, and it would be even more ludacris to expect you to write code to handle them. What you should do, however, is expect things to behave in an expected manner, and make sure if such things occur, you “fail securely”, like by just returning an error and stopping the execution flow.

It would make no sense to continue the login flow if the DB server is down, and it will make no sense to continue the media processing if you can’t store that image on that bucket. Break the flow, log the error, alert to the relevant channel — but don’t drop your security controls in the process.

6. Manage Your Third-Party Risk

Most modern applications use third-party services and/or import third-party code to enhance their offering. But how can we ensure secure integrations with third parties? We think about this principle a lot at Wix, as we offer third-party integrations to our user sites in many ways. For example, users can install apps from our App Market or add third-party software to their websites using our full-stack development platform called Velo.

Third-party code can be infiltrated, just like your own, but has the added complication that you have no control over it. MPM node libraries, for instance, are some of the most used in the world. But recently a few well-known cases involved them being compromised, leaving every site that used them exposed.

The most important thing is to be aware that this might happen. Keep track of all your open-source code in a software bill of materials (SBOM), and create processes for regularly reviewing it. If you can, run regular checks of all your third-party suppliers’ security practices. For example, at Wix we run a strict Third-Party Risk Management Program (TPRM) to vet third parties and assess security while working with them.

7. Remember Separation Of Duties (SoD)

Separation of duties really boils down to making sure tasks are split into (and limited to) appropriate user types, though this principle could also apply to subsystems.

The administrator of an eCommerce site, for example, should not be able to make purchases. And a user of the same site should not be promoted to administrator, as this might allow them to alter orders or give themselves free products.

The thinking behind this principle is simply that if one person is compromised or acting fraudulently, their actions shouldn’t compromise the whole environment.

8. Avoid Security By Obscurity

If you write a backdoor, it will be found. If you hard-code secrets in your code, they will be exposed. It’s not a question of “if”, but “when” — there is no way to keep things hidden forever. Hackers spend time and effort on building reconnaissance tools to target exactly these types of vulnerabilities (many such tools can be found with a quick Google search), and more often than not when you point at a target, you get a result.

The bottom line is simple: you cannot rely on hidden features to remain hidden. Instead, there should be enough security controls in place to keep your application safe when these features are found.

For example, it is common to generate access links based on randomly generated UUIDs. Consider a scenario where an anonymous user makes a purchase on your store, and you want to serve the invoice online. You cannot protect the invoice with permissions, as the user is anonymous, but it is sensitive data. So you would generate a “secret” UUID, build it into the link, and treat the “knowledge” of the link as “proof” of identity ownership.

But how long can this assumption remain true? Over time, such links (with UUID in them) might get indexed by search engines. They might end up on the Wayback Machine. They might be collected by a third-party service running on the end user’s browser (say a BI extension of some sort), then collected into some online DB, and one day accessed by a third party.

Adding a short time limit to such links (based on UUIDs) is a good compromise. We don’t rely on the link staying secret for long (so there’s no security by obscurity), just for a few hours. When the link gets discovered, it’s already no longer valid.

9. Keep Security Simple

Also known as KISS, or keep it simple, stupid. As developers, we need to keep users in mind at all times. If a service is too complicated to use, then its users might not know how to use it, and bypass it or use it incorrectly.

Take 2FA for example. We all know it’s more secure, but the process also involves a degree of manual setup. Making it as simple as possible to follow means more users will follow it, and not compromise their own accounts with weaker protections.

Adding new security functionality always makes a system more complex, so it can have an unintended negative impact on security. So keep it simple. Always weigh the value of new functionality against its complexity, and keep security architecture as simple as possible.

10. Fix Security Issues, Then Check Your Work

Thoroughly fixing security issues is important for all aspects of a business. At Wix, for example, we partner with ethical hackers through our Bug Bounty Program to help us find issues and vulnerabilities in our system, and practice fixing them. We also employ internal security and penetration testing, and the security team is constantly reviewing the production services, looking for potential bugs.

But fixing a bug is just the start. You also need to understand the vulnerability thoroughly before you fix it, and often get whoever spotted it to check your fix too. And then, when a bug is fixed, carry out regression tests to make sure it’s not reintroduced by code rollbacks. This process is crucial to make sure you’re actually advancing your application security posture.

Conclusion

By implementing security by design at Wix, we were able to build a robust and secure platform — and we hope that sharing our approach will help you do the same. We applied these principles not just to security features, but to all components of our system. We recommend considering this, whether you build from scratch or choose to rely on a secure platform like ours.

More importantly, following security by design instilled a security mindset into our company as a whole, from developers to marketing and sales. Cybersecurity should be top priority in everyone’s minds, as attacks increase and hackers find new ways of accessing sensitive information.

Taking a defensive position right from the start will put you at an advantage. Because when thinking about cybersecurity, it’s not if a breach happens. It’s when.

  • For more information on security by design, visit the Open Web Application Security Project. This non-profit community is dedicated to securing the web, and produces a range of free open-source tools, training and other resources to help improve software security.
  • To learn more about secure practices at Wix, check out wix.com/trust-center/security.