The Lessons We Learned From Programming at Google w/ Hyrum Wright and Titus Winters – Part 1

Google is a titan of technology with one of the largest codebases in the world. Managing and scaling this amount of code is a monumental task. And while Google doesn't profess to have all the answers, you can probably learn a thing or two from their journey.

In the first episode of a two-part series, Senior Google Staff Engineers Hyrum Wright and Titus Winters joined me on the Dev Interrupted podcast to discuss lessons learned from programming at Google and to talk about their new book.

Building a Database Written in Node.js From the Ground Up

The founding team at HarperDB built the first and only database written in Node.js. A few months back, our CEO Stephen Goldberg was invited to speak at a Women Who Code meetup to share the story of this (what some called crazy) endeavor. Stephen discussed the architectural layers of the database, demonstrated how to build a highly scalable and distributed product in Node.js, and demoed the inner workings of HarperDB. You can watch his talk at the link above, and even read a post from back in 2017, but since we all love Node.js and it’s an interesting topic, I’ll summarize here.

The main (and simplest) reason we chose to build a database in Node is that we knew it really well. We got flak for not choosing to Go, but people now accept that Go and Node are essentially head to head (in popularity and community support). Zach, one of our co-founders, recognized that with the time it would have taken to learn a new language, it would never be worth it.

Hot Database Connections for Serverless Functions

Preamble — Problems to Solve

Firing up new Serverless containers — a.k.a. cold starts — takes from one to several seconds (time varies per platform). To eliminate such cost/latency, Serverless frameworks keep already started containers warm for a period of time (duration varies per provider).

Serverless functions might make database access. Although less costly than starting new Serverless container, database connection creation and tear down may cost tens or even hundreds of milliseconds depending on your DBMS environment. This problem is exacerbated with Serverless functions that are are short-lived and cannot afford such cost on every call.

Developers and Databases

To learn about the current and future state of databases, we spoke with and received insights from 19 IT professionals. We asked, "What advanced database knowledge or skills do developers need?" Here’s what they shared with us:

Conceptual

  • More development is happening from top to bottom. Full stack development is how developers need to build things. All vendors like AWS and Azure have tools to enable. Database developers are becoming data engineers. Learn basic stats. Have breadth around Java, Go, Python, SQL. Have the knowledge you are building a service to help a customer deliver a solution to a problem.
  • App developers need to have a rudimentary understanding of how the data is being stored and accessed. How is the database being used and what are you trying to achieve with the application? Select the database based on the requirements of the applications. What requirements does your application put on the database? Database vendors are not as forthcoming with limitations.
  • 1) Understand the product try to do some upfront work before you get into the specific things you are trying to do. Understand how the product works. Some products have clusters and nodes that need to talk to each other, so you know how to write the application. NoSQL with three different nodes in the cluster. Node one may work but node two and three may not. A lot of developers think the data is the easy part and it’s not. It’s the part you cannot rewrite.
  • Being able to determine the right tool for the job is the knowledge that more developers need. The database landscape is flooded with solutions (some claim to be a jack-of-all-trades, while others have a more niche focus). But is the solution you picked going to be the right tool for the job? We see people pick Apache Cassandra because it’s scalable and reliable, but in talking to them it’s clear that a more traditional relational database would make more sense. And vice versa, where companies are hitting limits on their relational database and have a perfect use case for Cassandra. Either way, it’s important for more developers to hone skills around the fine details between different databases and knowing (with confidence) which one is most applicable to a specific use case.
  • In a world of managed services, developers should care less about deep database knowledge. This allows them to focus on the apps that differentiate their businesses and unlock new revenue streams from the data they own. They shouldn’t have to be experts on database management, but rather in building data-driven apps.
  • Developers need to understand how to interact with databases. Understanding database architectures, database design, access paths/optimizations and even how data is arranged on disk or in memory is useful, particularly if you're a backend developer. The following are important: Basic data modeling techniques, Normalization vs Denormalization, Understand SQL when to use foreign keys., Understanding execution plans., Using prepared statements., The different types of joins available, depending on the database., Understanding data obfuscation and encryption. Nowadays, developers tend to reference the database APIs to retrieve data, so they assume they don’t need database knowledge. This is not the case.
  • 1) Understand conceptually what’s going on things should be much easier for application architecture. Databases are providing easier integrations and streaming even if building on-prem. Choose the database for the business application logic. Expect more with the advancement of the database technology and cloud simplifies from an operational perspective. 2) Be familiar with different languages and paradigms for different use cases — analytic application, operational application, transactional application. NodeJS for a web app, Python for data-driven application, applications in Java. Use the right language and platform. 3) How can I leverage AI/ML as part of my application to create smarter and more intelligent applications? 4) Microservices-based architecture like event-driven application. 5) Containerization to speed up the process of build, test, and release software.

Data Science

  • Cloud is a big skill that developers should have. We’re training all of our developers on the cloud and cloud technology. More databases will be deployed on the cloud with many vendors. Know solution architecture on cloud, machine learning even if you’re not designing algorithms of doing ML as part of your day-to-day job developers will need to know how to implement ML tools and tool kits in the future. Seeing some coalescence of data science and developers. Speak the same language. Reduce wasteful cycles in a different language. A better grasp of data science concepts while scientists understand data and database management.
  • Understanding the power of distribution and being able to adopt data structures that match such infrastructures. It will also be important to have some know-how when it comes to data science, as understanding data will be even more important (especially when it comes to real-time data, which raises the next important piece of knowledge – real-time tooling). There are a lot of messaging queues and distributed processing frameworks that work well with databases and are required to handle the load. And last but not list, machine learning skills will become increasingly important.

Other

  • What kind of storage and persistence do you need to solve the problem you are trying to solve?
  • 1) Education no matter your title, research technology and get excited about it, spend the time to learn about the technology but stay fluid about tools that can be used to solve problems. 2) Think long and hard about the application release process and know there are ways to bring databases into the workflow. 3) Take the time to challenge the old tradition of doing things and reimagine what the database process looks like in a DevOps workflow. Embrace the philosophy of doing it differently.
  • Developers should apply best practices when it comes to database security. The database must be secured both at rest and during transport. Shortcuts are highly discouraged when it comes to database protection/security.
  • Governance a lot of people forget the importance of best practices in governance. Tests for CI and unit tests, checking code, using case tracking tools, all the best practices. Modeling given that developers have driven the NoSQL revolution to understand how data is modeled and how to model should be part of what you choose in a database. What kind of relationship do you want to discover and nodes you want to express? It’s important for companies to reassess how they assess database technologies with 300+ databases out there a lot of basic things that weren’t on the radar ten years ago are important today. You can’t take for granted that new database technologies have the security, performance, reliability, and trustability you need and want.
  • Developers need to understand SQL to be able to run powerful and complex queries. SQL is still the lingua franca of the data world; the appearance of NoSQL and Hadoop did not diminish its utility. SQL's advantage is that it is an industry standard, with an entire ecosystem and many learning resources around it. But it is a complex language that requires time to learn.
  • What should vendors be doing – abstract away the complexity. Provide developers with the right kind of RESTFUL APIs and SDKs. Abstract away GPUs. Developers need to understand the roles of the new types of people in the organization. What goes into creating an ML algorithm. Understand the lineage of how they came to be. More collaboration between the data and the development team. Break down barriers.
  • Technical aspects are non-quantitative. Place more importance on the soft skills how to better interact with your customers, the business side of the house, to get determine and meet their needs in a shorter timeframe.

Here are the contributors of insight, knowledge, and experience: