AI Advancement for API and Microservices

Recent AI advancements in API technology involve enhancing natural language processing capabilities, improving algorithmic decision-making through reinforcement learning, and expanding AI integration across diverse sectors like healthcare, finance, and e-commerce to create more intelligent, adaptable, and tailored API solutions.

Key Trends and Advancements

AutoML for APIs

AutoML (Automated Machine Learning) tools are increasingly being used to automate the development of machine learning models that can be exposed through APIs. This streamlines the process of building AI-powered APIs by reducing the need for manual intervention in model training and deployment.

Secure Your API With These 16 Practices With Apache APISIX (Part 1)

A couple of months ago, I stumbled upon this list of 16 practices to secure your API:

  1. Authentication: Verifies the identity of users accessing APIs.
  2. Authorization: Determines permissions of authenticated users.
  3. Data redaction: Obscures sensitive data for protection.
  4. Encryption: Encodes data so only authorized parties can decode it.
  5. Error handling: Manages responses when things go wrong, avoiding revealing sensitive info.
  6. Input validation and data sanitization: Checks input data and removes harmful parts.
  7. Intrusion detection systems: Monitor networks for suspicious activities.
  8. IP Whitelisting: Permits API access only from trusted IP addresses.
  9. Logging and monitoring: Keeps detailed logs and regularly monitors APIs.
  10. Rate limiting: Limits user requests to prevent overload.
  11. Secure dependencies: Ensures third-party code is free from vulnerabilities.
  12. Security headers: Enhances site security against types of attacks like XSS.
  13. Token expiry: Regularly expiring and renewing tokens prevents unauthorized access.
  14. Use of security standards and frameworks: Guides your API security strategy.
  15. Web application firewall: Protects your site from HTTP-specific attacks.
  16. API versioning: Maintains different versions of your API for seamless updates.

While it's debatable whether some points relate to security, e.g., versioning, the list is a good starting point anyway. In this two-post series, I'd like to describe how we can implement each point with Apache APISIX (or not).

Will Slow Requests in API Gateway Affect Other Requests?

A frequently discussed concern in the realm of API gateways is the ability to efficiently handle a substantial number of concurrent requests. Specifically, the question arises: will slow requests significantly increase the response time of other normal requests in the API gateway?

The answer is that APISIX excels in this regard, demonstrating that the slow requests do not adversely impact other normal requests. However, for API gateway products based on different languages and software architectures, the performance may not be as favorable.

The View Transitions API And Delightful UI Animations (Part 1)

Animations are an essential part of a website. They can draw attention, guide users on their journey, provide satisfying and meaningful feedback to interaction, add character and flair to make the website stand out, and so much more!

On top of that, CSS has provided us with transitions and keyframe-based animations since at least 2009. Not only that, the Web Animations API and JavaScript-based animation libraries, such as the popular GSAP, are widely used for building very complex and elaborate animations.

With all these avenues for making things move on the web, you might wonder where the View Transitions API fits in in all this. Consider the following example of a simple task list with three columns.

We’re merely crossfading between the two screen states, and that includes all elements within it (i.e., other images, cards, grid, and so on). The API is unaware that the image that is being moved from the container (old state) to the overlay (new state) is the same element.

We need to instruct the browser to pay special attention to the image element when switching between states. That way, we can create a special transition animation that is applied only to that element. The CSS view-transition-name property applies the name of the view transition we want to apply to the transitioning elements and instructs the browser to keep track of the transitioning element’s size and position while applying the transition.

We get to name the transition anything we want. Let’s go with active-image, which is going to be declared on a .gallery__image--active class that is a modifier of the class applied to images (.gallery-image) when the transition is in an active state:

.gallery__image--active {
  view-transition-name: active-image;
}

Note that view-transition-name has to be a unique identifier and applied to only a single rendered element during the animation. This is why we are applying the property to the active image element (.gallery__image--active). We can remove the class when the image overlay is closed, return the image to its original position, and be ready to apply the view transition to another image without worrying whether the view transition has already been applied to another element on the page.

So, we have an active class, .gallery__image--active, for images that receive the view transition. We need a method for applying that class to an image when the user clicks on that respective image. We can also wait for the animation to finish by storing the transition in a variable and calling await on the finished attribute to toggle off the class and clean up our work.

// Start the transition and save its instance in a variable
const transition = document.startViewTransition(() =&gtl /* ... */);

// Wait for the transition to finish.
await transition.finished;

/* Cleanup after transition has completed */

Let’s apply this to our example:

function toggleImageView(index) {
  const image = document.getElementById(js-gallery-image-${index});

  // Apply a CSS class that contains the view-transition-name before the animation starts.
  image.classList.add("gallery__image--active");

  const imageParentElement = image.parentElement;

  if (!document.startViewTransition) {
    // Fallback if View Transitions API is not supported.
    moveImageToModal(image);
  } else {
    // Start transition with the View Transitions API.
    document.startViewTransition(() => moveImageToModal(image));
  }

  // This click handler function is now async.
  overlayWrapper.onclick = async function () {
    // Fallback if View Transitions API is not supported.
    if (!document.startViewTransition) {
      moveImageToGrid(imageParentElement);
      return;
    }

    // Start transition with the View Transitions API.
    const transition = document.startViewTransition(() => moveImageToGrid(imageParentElement));

    // Wait for the animation to complete.
    await transition.finished;

    // Remove the class that contains the page-transition-tag after the animation ends.
    image.classList.remove("gallery__image--active");
  };
}

Alternatively, we could have used JavaScript to toggle the CSS view-transition-name property on the element in the inline HMTL. However, I would recommend keeping everything in CSS as you might want to use media queries and feature queries to create fallbacks and manage it all in one place.

// Applies view-transition-name to the image
image.style.viewTransitionName = "active-image";

// Removes view-transition-name from the image
image.style.viewTransitionName = "none";

And that’s pretty much it! Let’s take a look at our example (in Chrome) with the transition element applied.

Customizing Animation Duration And Easing In CSS

What we just looked at is what I would call the default experience for the View Transitions API. We can do so much more than a transition that crossfades between two states. Specifically, just as you might expect from something that resembles a CSS animation, we can configure a view transition’s duration and timing function.

In fact, the View Transitions API makes use of CSS animation properties, and we can use them to fully customize the transition’s behavior. The difference is what we declare them on. Remember, a view transition is not part of the DOM, so what is available for us to select in CSS if it isn’t there?

When we run the startViewTransition function, the API pauses rendering, captures the new state of the page, and constructs a pseudo-element tree:

::view-transition
└─ ::view-transition-group(root)
   └─ ::view-transition-image-pair(root)
      ├─ ::view-transition-old(root)
      └─ ::view-transition-new(root)

Each one is helpful for customizing different parts of the transition:

  • ::view-transition: This is the root element, which you can consider the transition’s body element. The difference is that this pseudo-element is contained in an overlay that sits on top of everything else on the top.
    • ::view-transition-group: This mirrors the size and position between the old and new states.
      • ::view-transition-image-pair: This is the only child of ::view-transition-group, providing a container that isolates the blending work between the snapshots of the old and new transition states, which are direct children.
        • ::view-transition-old(...): A snapshot of the “old” transition state.
        • ::view-transition-new(...): A live representation of the new transition state.

Yes, there are quite a few moving parts! But the purpose of it is to give us tons of flexibility as far as selecting specific pieces of the transition.

So, remember when we applied view-transition-name: active-image to the .gallery__image--active class? Behind the scenes, the following pseudo-element tree is generated, and we can use the pseudo-elements to target either the active-image transition element or other elements on the page with the root value.

::view-transition
├─ ::view-transition-group(root)
│  └─ ::view-transition-image-pair(root)
│     ├─ ::view-transition-old(root)
│     └─ ::view-transition-new(root)
└─ ::view-transition-group(active-image)
   └─ ::view-transition-image-pair(active-image)
      ├─ ::view-transition-old(active-image)
      └─ ::view-transition-new(active-image)

In our example, we want to modify both the cross-fade (root) and transition element (active-image ) animations. We can use the universal selector (*) with the pseudo-element to change animation properties for all available transition elements and target pseudo-elements for specific animations using the page-transition-tag value.

/* Apply these styles only if API is supported */
@supports (view-transition-name: none) {
  /* Cross-fade animation */
  ::view-transition-image-pair(root) {
    animation-duration: 400ms;
    animation-timing-function: ease-in-out;
  }

  /* Image size and position animation */
  ::view-transition-group(active-image) {
    animation-timing-function: cubic-bezier(0.215, 0.61, 0.355, 1);
  }
}

Accessible Animations

Of course, any time we talk about movement on the web, we also ought to be mindful of users with motion sensitivities and ensure that we account for an experience that reduces motion.

That’s what the CSS prefers-reduced-motion query is designed for! With it, we can sniff out users who have enabled accessibility settings at the OS level that reduce motion and then reduce motion on our end of the work. The following example is a heavy-handed solution that nukes all animation in those instances, but it’s worth calling out that reduced motion does not always mean no motion. So, while this code will work, it may not be the best choice for your project, and your mileage may vary.

@media (prefers-reduced-motion) {
  ::view-transition-group(*),
  ::view-transition-old(*),
  ::view-transition-new(*) {
    animation: none !important;
  }
}

Final Demo

Here is the completed demo with fallbacks and prefers-reduced-motion snippet implemented. Feel free to play around with easings and timings and further customize the animations.

This is a perfect example of how the View Transitions API tracks an element’s position and dimensions during animation and transitions between the old and new snapshots right out of the box!

See the Pen Add to cart animation v2 - completed [forked] by Adrian Bece.

Conclusion

It amazes me every time how the View Transitions API turns expensive-looking animations into somewhat trivial tasks with only a few lines of code. When done correctly, animations can breathe life into any project and offer a more delightful and memorable user experience.

That all being said, we still need to be careful how we use and implement animations. For starters, we’re still talking about a feature that is supported only in Chrome at the time of this writing. But with Safari’s positive stance on it and an open ticket to implement it in Firefox, there’s plenty of hope that we’ll get broader support — we just don’t know when.

Also, the View Transitions API may be “easy,” but it does not save us from ourselves. Think of things like slow or repetitive animations, needlessly complex animations, serving animations to those who prefer reduced motion, among other poor practices. Adhering to animation best practices has never been more important. The goal is to ensure that we’re using view transitions in ways that add delight and are inclusive rather than slapping them everywhere for the sake of showing off.

In another article to follow this one, we’ll use View Transitions API to create full-page transitions in our single-page and multi-page applications — you know, the sort of transitions we see when navigating between two views in a native mobile app. Now, we have those readily available for the web, too!

Until then, go build something awesome… and use it to experiment with the View Transitions API!

References

Architecting a Comprehensive Testing Framework for API and UI Testing

In the ever-evolving landscape of software development, quality assurance, and testing play pivotal roles in ensuring that applications function seamlessly and deliver a superior user experience. To achieve this, a well-designed and versatile testing framework is essential. In this article, we delve into the architecture and design principles of a testing framework capable of handling both API and UI testing. Such a framework not only optimizes the testing process but also enhances the overall quality of software.

Introduction

Testing in software development is a multifaceted process, encompassing a range of testing types such as unit testing, integration testing, and system testing. API and UI testing stand out as critical components of this ecosystem. API testing ensures that the backend services and data interactions are robust and error-free, while UI testing focuses on the user interface and user experience.

Optimizing API Lifecycles: A Comprehensive Guide for Product Managers

In this article, we will delve into the intricacies of optimizing API lifecycles—an essential aspect for product managers navigating the dynamic landscape of digital integration. From conceptualization to retirement, understanding and implementing best practices throughout the API lifecycle is crucial for creating robust, scalable, and future-proofed integrations.

The Birth of an API: Conceptualization and Design

Identifying Business Needs

Before the first line of code is written, product managers must collaborate with stakeholders in any sector to identify financial needs. For instance, envision a scenario where the demand for real-time transaction data prompts the creation of a new API, enhancing interoperability among financial systems.

OpenAI’s DevDay Unveils GPT-4 Turbo: Consequences & Questions to Consider

Yesterday, OpenAI's inaugural DevDay conference in San Francisco unveiled a series of groundbreaking announcements, leaving the tech community humming with both excitement and a degree of uncertainty. The reveal of GPT-4 Turbo, a new wave of customizable AI through user-friendly APIs, and the promise to protect businesses from copyright infringement claims, stand out as critical moments that are reshaping the landscape of artificial intelligence. As the tech industry digests the implications of these developments, several questions emerge: What do these advancements mean for the future of AI? And how will they reshape the competitive landscape of startups and tech giants alike?

gtp4-turbo.jpg

Key Takeaways from OpenAI's DevDay

The announcements from DevDay underscore a dynamic and ever-evolving domain, showcasing OpenAI's commitment to extending the frontiers of AI technology. These are the key revelations:

  • GPT-4 Turbo: An enhanced version of GPT-4 that is both more powerful and more cost-efficient.
  • Customizable Chatbots: OpenAI now allows users to create their own GPT versions for various use cases without any coding knowledge.
  • GPT Store: A new marketplace for user-created AI bots is on the horizon.
  • Assistants API: This new API enables the building of agent-like experiences, broadening the scope of possible AI applications.
  • DALL-E 3 API: OpenAI's text-to-image model is now more accessible, complete with moderation tools.
  • Text-to-Speech APIs: OpenAI introduces a suite of expressive AI voices.
  • Copyright Shield: A pledge to defend businesses from copyright infringement claims linked to the use of OpenAIs tools.

Recommended articles with more details on these announcements can be found on The Verge, and additional coverage on TechCrunch.

Questions Raised by DevDay

The advancements announced at DevDay suggest the next seismic shift in the AI landscape, with OpenAI demonstrating its formidable influence and technological prowess. Notably, OpenAI's move to enable the creation of custom GPT models and their decision to offer a GPT store could also democratize AI development, making sophisticated AI tools more accessible to a broader audience.

However, this democratization comes with its own set of questions. Will this influx of AI capabilities stifle innovation in startups, or will it spur a new wave of creativity? Discussions on Reddit indicate a mixed response from the community, with some lamenting the potential demise of startups that relied on existing gaps in the AI market, while others see it as an evolution that weeds out those unable to adapt and innovate.

Another important implication is the potential for AI models like GPT-4 Turbo to replace certain jobs, as they become more capable and less costly. As the world's most influential AI platform begins to perform complex tasks more efficiently, what will be the societal and economic repercussions?

Furthermore, the Copyright Shield program by OpenAI suggests a world where AI-generated content becomes ubiquitous, potentially challenging our existing norms around intellectual property and copyright law. How will this impact creators and the legal frameworks that protect their work?

The Future of AI: An OpenAI Monopoly?

With these developments, OpenAI continues to cement its position as a leader in the AI space. But does this come at the cost of reduced competition and potential monopolization? As we've seen in other sectors, a dominant player can stifle competition, which is often the lifeblood of innovation. A historical example is the web browser market, where Microsoft's Internet Explorer once held a dominant position. By bundling Internet Explorer with its Windows operating system, Microsoft was able to gain a significant market share, which led to antitrust lawsuits and concerns over lack of competition. This dominance not only discouraged other browser developments but also slowed the pace of innovation within the web browsing experience itself. It wasn't until the rise of competitors like Firefox and Google Chrome that we saw a resurgence in browser innovation and an improvement in user experience.

From this point of view, the move to simplify the use of AI through user-friendly interfaces and APIs is a double-edged sword. On one hand, it enables a wider range of creators and developers to engage with AI technology. On the other, it could concentrate power in the hands of a single entity, controlling the direction and ethics of AI development. This centralization poses potential risks for competitive diversity and requires careful oversight to maintain a healthy, multi-stakeholder ecosystem.

The Rise of GPT-4 Turbo: Job-Insecurities & the Ripple Effect on Startups

The accessibility of advanced AI tools could mean a democratized future where innovation is not the sole province of those with deep pockets or advanced technical skills. It might level the playing field or, as some on Reddit have pointed out, could quash many startups that have thrived in the niches OpenAI now seems prepared to fill. The sentiment shared by the community reflects a broader anxiety permeating the tech industry: the fear of being rendered obsolete by the relentless march of AI progress. The speed at which OpenAI is iterating its models and the scope of their functionality are formidable, to say the least.

With the advent of OpenAI's GPT-4 Turbo, we're forced to confront an uncomfortable question: what happens to human jobs when AI becomes better and cheaper at performing them? The argument in favor of AI equipped with human-like abilities often hinges on the promise of automation enhancing productivity. However, the lower costs associated with AI-driven solutions compared to human labor could incentivize companies to replace their human workforce. With GPT-4 Turbo, not only is the efficiency of tasks expected to increase, but the economic rationale for businesses to adopt AI becomes even more compelling. While it's true that new types of jobs will likely emerge in the wake of AI's rise, the transition could be tumultuous. The risk is that the job market may not adapt quickly enough to absorb the displaced workers, leading to a potential increase in unemployment and the need for large-scale retraining programs.

And it's not just about the jobs that AI can replace, but also about the broader implications for the labor market and society. The possibility of AI surpassing human capabilities in certain sectors raises fundamental questions about the value we place on human labor and the structure of our economy. Can we ensure a fair transition for those whose jobs are at risk? As AI models like GPT-4 Turbo become more ingrained in our economic fabric, these are the urgent questions we must address to ensure that the future of work is equitable for all.

The AI Revolution is Accelerating

The implications of such rapid development in AI are profound. With increased power and reach, comes greater responsibility. OpenAI's commitment to defending businesses from copyright claims raises questions about how AI-generated content will be regulated and the ethical considerations of AI mimicking human creativity. Moreover, as AI becomes more integrated into our lives, the potential for misuse or unintended consequences grows.

OpenAI's DevDay has undoubtedly set a new pace for the AI industry. The implications of these announcements will be felt far and wide, sparking debates on ethics, economics, and the future of innovation. As we grapple with these questions, one thing is clear: the AI revolution is accelerating, and we must prepare for a future that looks markedly different from today's world.

JBang: How to Script With Java for Data Import From an API

It's right in the middle of the busy conference season, and I was prepping for an upcoming conference talk.

As I often do, I went to Neo4j Aura to spin up a free database and use Cypher with APOC to import data from an API, but this API requires a header, and the APOC procedure that adds headers to a request is blocked by security in Aura. Hmm, I needed a new route.

Passkeys: A No-Frills Explainer On The Future Of Password-Less Authentication

Passkeys are a new way of authenticating applications and websites. Instead of having to remember a password, a third-party service provider (e.g., Google or Apple) generates and stores a cryptographic key pair that is bound to a website domain. Since you have access to the service provider, you have access to the keys, which you can then use to log in.

This cryptographic key pair contains both private and public keys that are used for authenticating messages. These key pairs are often known as asymmetric or public key cryptography.

Public and private key pair? Asymmetric cryptography? Like most modern technology, passkeys are described by esoteric verbiage and acronyms that make them difficult to discuss. That’s the point of this article. I want to put the complex terms aside and help illustrate how passkeys work, explain what they are effective at, and demonstrate what it looks like to work with them.

How Passkeys Work

Passkeys are cryptographic keys that rely on generating signatures. A signature is proof that a message is authentic. How so? It happens first by hashing (a fancy term for “obscuring”) the message and then creating a signature from that hash with your private key. The private key in the cryptographic key pair allows the signature to be generated, and the public key, which is shared with others, allows the service to verify that the message did, in fact, come from you.

In short, passkeys consist of two keys: a public and private. One verifies a signature while the other verifies you, and the communication between them is what grants you access to an account.

Here’s a quick way of generating a signing and verification key pair to authenticate a message using the SubtleCrypto API. While this is only part of how passkeys work, it does illustrate how the concept works cryptographically underneath the specification.

const message = new TextEncoder().encode("My message");

const keypair = await crypto.subtle.generateKey(
  { name: "ECDSA", namedCurve: "P-256" },
  true,
  [ 'sign', 'verify' ]
);

const signature = await crypto.subtle.sign(
  { name: "ECDSA", hash: "SHA-256" },
  keypair.privateKey,
  message
);

// Normally, someone else would be doing the verification using your public key
// but it's a bit easier to see it yourself this way
console.log(
  "Did my private key sign this message?",
  await crypto.subtle.verify(
    { name: "ECDSA", hash: "SHA-256" },
    keypair.publicKey,
    signature,
    message
  )
);

Notice the three parts pulling all of this together:

  1. Message: A message is constructed.
  2. Key pair: The public and private keys are generated. One key is used for the signature, and the other is set to do the verification.
  3. Signature: A signature is signed by the private key, verifying the message’s authenticity.

From there, a third party would authenticate the private key with the public key, verifying the correct pair of keys or key pair. We’ll get into the weeds of how the keys are generated and used in just a bit, but for now, this is some context as we continue to understand why passkeys can potentially erase the need for passwords.

Why Passkeys Can Replace Passwords

Since the responsibility of storing passkeys is removed and transferred to a third-party service provider, you only have to control the “parent” account in order to authenticate and gain access. This is a lot like requiring single sign-on (SSO) for an account via Google, Facebook, or LinkedIn, but instead, we use an account that has control of the passkey stored for each individual website.

For example, I can use my Google account to store passkeys for somerandomwebsite.com. That allows me to prove a challenge by using that passkey’s private key and thus authenticate and log into somerandomwebsite.com.

For the non-tech savvy, this typically looks like a prompt that the user can click to log in. Since the credentials (i.e., username and password) are tied to the domain name (somerandomwebsite.com), and passkeys created for a domain name are only accessible to the user at login, the user can select which passkey they wish to use for access. This is usually only one login, but in some cases, you can create multiple logins for a single domain and then select which one you wish to use from there.

So, what’s the downside? Having to store additional cryptographic keys for each login and every site for which you have a passkey often requires more space than storing a password. However, I would argue that the security gains, the user experience from not having to remember a password, and the prevention of common phishing techniques more than offset the increased storage space.

How Passkeys Protect Us

Passkeys prevent a couple of security issues that are quite common, specifically leaked database credentials and phishing attacks.

Database Leaks

Have you ever shared a password with a friend or colleague by copying and pasting it for them in an email or text? That could lead to a security leak. So would a hack on a system that stores customer information, like passwords, which is then sold on dark marketplaces or made public. In many cases, it’s a weak set of credentials — like an email and password combination — that can be stolen with a fair amount of ease.

Passkeys technology circumvents this because passkeys only store a public key to an account, and as you may have guessed by the name, this key is expected to be made accessible to anyone who wants to use it. The public key is only used for verification purposes and, for the intended use case of passkeys, is effectively useless without the private key to go with it, as the two are generated as a pair. Therefore, those previous juicy database leaks are no longer useful, as they can no longer be used for cracking the password for your account. Cracking a similar private key would take millions of years at this point in time.

Phishing

Passwords rely on knowing what the password is for a given login: anyone with that same information has the same level of access to the same account as you do. There are sophisticated phishing sites that look like they’re by Microsoft or Google and will redirect you to the real provider after you attempt to log into their fake site. The damage is already done at that point; your credentials are captured, and hopefully, the same credentials weren’t being used on other sites, as now you’re compromised there as well.

A passkey, by contrast, is tied to a domain. You gain a new element of security: the fact that only you have the private key. Since the private key is not feasible to remember nor computationally easy to guess, we can guarantee that you are who you say we are (at least as long as your passkey provider is not compromised). So, that fake phishing site? It will not even show the passkey prompt because the domain is different, and thus completely mitigates phishing attempts.

There are, of course, theoretical attacks that can make passkeys vulnerable, like someone compromising your DNS server to send you to a domain that now points to their fake site. That said, you probably have deeper issues to concern yourself with if it gets to that point.

Implementing Passkeys

At a high level, a few items are needed to start using passkeys, at least for the common sign-up and log-in process. You’ll need a temporary cache of some sort, such as redis or memcache, for storing temporary challenges that users can authenticate against, as well as a more permanent data store for storing user accounts and their public key information, which can be used to authenticate the user over the course of their account lifetime. These aren’t hard requirements but rather what’s typical of what would be developed for this kind of authentication process.

To understand passkeys properly, though, we want to work through a couple of concepts. The first concept is what is actually taking place when we generate a passkey. How are passkeys generated, and what are the underlying cryptographic primitives that are being used? The second concept is how passkeys are used to verify information and why that information can be trusted.

Generating Passkeys

A passkey involves an authenticator to generate the key pair. The authenticator can either be hardware or software. For example, it can be a hardware security key, the operating system’s Trusted Platform Module (TPM), or some other application. In the cases of Android or iOS, we can use the device’s secure enclave.

To connect to an authenticator, we use what’s called the Client to Authenticator Protocol (CTAP). CTAP allows us to connect to hardware over different connections through the browser. For example, we can connect via CTAP using an NFC, Bluetooth, or a USB connection. This is useful in cases where we want to log in on one device while another device contains our passkeys, as is the case on some operating systems that do not support passkeys at the time of writing.

A passkey is built off another web API called WebAuthn. While the APIs are very similar, the WebAuthn API differs in that passkeys allow for cloud syncing of the cryptographic keys and do not require knowledge of whom the user is to log in, as that information is stored in a passkey with its Relying Party (RP) information. The two APIs otherwise share the same flows and cryptographic operations.

Storing Passkeys

Let’s look at an extremely high-level overview of how I’ve stored and kept track of passkeys in my demo repo. This is how the database is structured.

Basically, a users table has public_keys, which, in turn, contains information about the public key, as well as the public key itself.

From there, I’m caching certain information, including challenges to verify authenticity and data about the sessions in which the challenges take place.

Again, this is only a high-level look to give you a clearer idea of what information is stored and how it is stored.

Verifying Passkeys

There are several entities involved in passkey:

  1. The authenticator, which we previously mentioned, generates our key material.
  2. The client that triggers the passkey generation process via the navigator.credentials.create call.
  3. The Relying Party takes the resulting public key from that call and stores it to be used for subsequent verification.

In our case, you are the client and the Relying Party is the website server you are trying to sign up and log into. The authenticator can either be your mobile phone, a hardware key, or some other device capable of generating your cryptographic keys.

Passkeys are used in two phases: the attestation phase and the assertion phase. The attestation phase is likened to a registration that you perform when first signing up for a service. Instead of an email and password, we generate a passkey.

Assertion is similar to logging in to a service after we are registered, and instead of verifying with a username and password, we use the generated passkey to access the service.

Each phase initially requires a random challenge generated by the Relying Party, which is then signed by the authenticator before the client sends the signature back to the Relying Party to prove account ownership.

Browser API Usage

We’ll be looking at how the browser constructs and supplies information for passkeys so that you can store and utilize it for your login process. First, we’ll start with the attestation phase and then the assertion phase.

Attest To It

The following shows how to create a new passkey using the navigator.credentials.create API. From it, we receive an AuthenticatorAttestationResponse, and we want to send portions of that response to the Relying Party for storage.

const { challenge } = await (await fetch("/attestation/generate")).json(); // Server call mock to get a random challenge

const options = {
 // Our challenge should be a base64-url encoded string
 challenge: new TextEncoder().encode(challenge),
 rp: {
  id: window.location.host,
  name: document.title,
 },
 user: {
  id: new TextEncoder().encode("my-user-id"),
  name: 'John',
  displayName: 'John Smith',
 },
 pubKeyCredParams: [ // See COSE algorithms for more: https://www.iana.org/assignments/cose/cose.xhtml#algorithms
  {
   type: 'public-key',
   alg: -7, // ES256
  },
  {
   type: 'public-key',
   alg: -256, // RS256
  },
  {
   type: 'public-key',
   alg: -37, // PS256
  },
 ],
 authenticatorSelection: {
  userVerification: 'preferred', // Do you want to use biometrics or a pin?
  residentKey: 'required', // Create a resident key e.g. passkey
 },
 attestation: 'indirect', // indirect, direct, or none
 timeout: 60_000,
};

// Create the credential through the Authenticator
const credential = await navigator.credentials.create({
 publicKey: options
});

// Our main attestation response. See: https://developer.mozilla.org/en-US/docs/Web/API/AuthenticatorAttestationResponse
const attestation = credential.response as AuthenticatorAttestationResponse;

// Now send this information off to the Relying Party
// An unencoded example payload with most of the useful information
const payload = {
 kid: credential.id,
 clientDataJSON: attestation.clientDataJSON,
 attestationObject: attestation.attestationObject,
 pubkey: attestation.getPublicKey(),
 coseAlg: attestation.getPublicKeyAlgorithm(),
};

The AuthenticatorAttestationResponse contains the clientDataJSON as well as the attestationObject. We also have a couple of useful methods that save us from trying to retrieve the public key from the attestationObject and retrieving the COSE algorithm of the public key: getPublicKey and getPublicKeyAlgorithm.

Let’s dig into these pieces a little further.

Parsing The Attestation clientDataJSON

The clientDataJSON object is composed of a few fields we need. We can convert it to a workable object by decoding it and then running it through JSON.parse.

type DecodedClientDataJSON = {
 challenge: string,
 origin: string,
 type: string
};

const decoded: DecodedClientDataJSON = JSON.parse(new TextDecoder().decode(attestation.clientDataJSON));
const {
 challenge,
 origin,
 type
} = decoded;

Now we have a few fields to check against: challenge, origin, type.

Our challenge is the Base64-url encoded string that was passed to the server. The origin is the host (e.g., https://my.passkeys.com) of the server we used to generate the passkey. Meanwhile, the type is webauthn.create. The server should verify that all the values are expected when parsing the clientDataJSON.

Decoding TheattestationObject

The attestationObject is a CBOR encoded object. We need to use a CBOR decoder to actually see what it contains. We can use a package like cbor-x for that.

import { decode } from 'cbor-x/decode';

enum DecodedAttestationObjectFormat {
  none = 'none',
  packed = 'packed',
}
type DecodedAttestationObjectAttStmt = {
  x5c?: Uint8Array[];
  sig?: Uint8Array;
};

type DecodedAttestationObject = {
  fmt: DecodedAttestationObjectFormat;
  authData: Uint8Array;
  attStmt: DecodedAttestationObjectAttStmt;
};

const decodedAttestationObject: DecodedAttestationObject = decode(
 new Uint8Array(attestation.attestationObject)
);

const {
 fmt,
 authData,
 attStmt,
} = decodedAttestationObject;

fmt will often be evaluated to "none" here for passkeys. Other types of fmt are generated through other types of authenticators.

Accessing authData

The authData is a buffer of values with the following structure:

Name Length (bytes) Description
rpIdHash 32 This is the SHA-256 hash of the origin, e.g., my.passkeys.com.
flags 1 Flags determine multiple pieces of information (specification).
signCount 4 This should always be 0000 for passkeys.
attestedCredentialData variable This will contain credential data if it’s available in a COSE key format.
extensions variable These are any optional extensions for authentication.

It is recommended to use the getPublicKey method here instead of manually retrieving the attestedCredentialData.

A Note About The attStmt Object

This is often an empty object for passkeys. However, in other cases of a packed format, which includes the sig, we will need to perform some authentication to verify the sig. This is out of the scope of this article, as it often requires a hardware key or some other type of device-based login.

Retrieving The Encoded Public Key

The getPublicKey method can retrieve the Subject Public Key Info (SPKI) encoded version of the public key, which is a different from the COSE key format (more on that next) within the attestedCredentialData that the decodedAttestationObject.attStmt has. The SPKI format has the benefit of being compatible with a Web Crypto importKey function to more easily verify assertion signatures in the next phase.

// Example of importing attestation public key directly into Web Crypto
const pubkey = await crypto.subtle.importKey(
  'spki',
  attestation.getPublicKey(),
  { name: "ECDSA", namedCurve: "P-256" },
  true,
  ['verify']
);

Generating Keys With COSE Algorithms

The algorithms that can be used to generate cryptographic material for a passkey are specified by their COSE Algorithm. For passkeys generated for the web, we want to be able to generate keys using the following algorithms, as they are supported natively in Web Crypto. Personally, I prefer ECDSA-based algorithms since the key sizes are quite a bit smaller than RSA keys.

The COSE algorithms are declared in the pubKeyCredParams array within the AuthenticatorAttestationResponse. We can retrieve the COSE algorithm from the attestationObject with the getPublicKeyAlgorithm method. For example, if getPublicKeyAlgorithm returned -7, we’d know that the key used the ES256 algorithm.

Name Value Description
ES512 -36 ECDSA w/ SHA-512
ES384 -35 ECDSA w/ SHA-384
ES256 -7 ECDSA w/ SHA-256
RS512 -259 RSASSA-PKCS1-v1_5 using SHA-512
RS384 -258 RSASSA-PKCS1-v1_5 using SHA-384
RS256 -257 RSASSA-PKCS1-v1_5 using SHA-256
PS512 -39 RSASSA-PSS w/ SHA-512
PS384 -38 RSASSA-PSS w/ SHA-384
PS256 -37 RSASSA-PSS w/ SHA-256

Responding To The Attestation Payload

I want to show you an example of a response we would send to the server for registration. In short, the safeByteEncode function is used to change the buffers into Base64-url encoded strings.

type AttestationCredentialPayload = {
  kid: string;
  clientDataJSON: string;
  attestationObject: string;
  pubkey: string;
  coseAlg: number;
};

const payload: AttestationCredentialPayload = {
  kid: credential.id,
  clientDataJSON: safeByteEncode(attestation.clientDataJSON),
  attestationObject: safeByteEncode(attestation.attestationObject),
  pubkey: safeByteEncode(attestation.getPublicKey() as ArrayBuffer),
  coseAlg: attestation.getPublicKeyAlgorithm(),
};

The credential id (kid) should always be captured to look up the user’s keys, as it will be the primary key in the public_keys table.

From there:

  1. The server would check the clientDataJSON to ensure the same challenge is used.
  2. The origin is checked, and the type is set to webauthn.create.
  3. We check the attestationObject to ensure it has an fmt of none, the rpIdHash of the authData, as well as any flags and the signCount.

Optionally, we could check to see if the attestationObject.attStmt has a sig and verify the public key against it, but that’s for other types of WebAuthn flows we won’t go into.

We should store the public key and the COSE algorithm in the database at the very least. It is also beneficial to store the attestationObject in case we require more information for verification. The signCount is always incremented on every login attempt if supporting other types of WebAuthn logins; otherwise, it should always be for 0000 for a passkey.

Asserting Yourself

Now we have to retrieve a stored passkey using the navigator.credentials.get API. From it, we receive the AuthenticatorAssertionResponse, which we want to send portions of to the Relying Party for verification.

const { challenge } = await (await fetch("/assertion/generate")).json(); // Server call mock to get a random challenge

const options = {
  challenge: new TextEncoder().encode(challenge),
  rpId: window.location.host,
  timeout: 60_000,
};

// Sign the challenge with our private key via the Authenticator
const credential = await navigator.credentials.get({
  publicKey: options,
  mediation: 'optional',
});

// Our main assertion response. See: <https://developer.mozilla.org/en-US/docs/Web/API/AuthenticatorAssertionResponse>
const assertion = credential.response as AuthenticatorAssertionResponse;

// Now send this information off to the Relying Party
// An example payload with most of the useful information
const payload = {
  kid: credential.id,
  clientDataJSON: safeByteEncode(assertion.clientDataJSON),
  authenticatorData: safeByteEncode(assertion.authenticatorData),
  signature: safeByteEncode(assertion.signature),
};

The AuthenticatorAssertionResponse again has the clientDataJSON, and now the authenticatorData. We also have the signature that needs to be verified with the stored public key we captured in the attestation phase.

Decoding The Assertion clientDataJSON

The assertion clientDataJSON is very similar to the attestation version. We again have the challenge, origin, and type. Everything is the same, except the type is now webauthn.get.

type DecodedClientDataJSON = {
  challenge: string,
  origin: string,
  type: string
};

const decoded: DecodedClientDataJSON = JSON.parse(new TextDecoder().decode(assertion.clientDataJSON));
const {
  challenge,
  origin,
  type
} = decoded;

Understanding The authenticatorData

The authenticatorData is similar to the previous attestationObject.authData, except we no longer have the public key included (e.g., the attestedCredentialData ), nor any extensions.

Name Length (bytes) Description
rpIdHash 32 This is a SHA-256 hash of the origin, e.g., my.passkeys.com.
flags 1 Flags that determine multiple pieces of information (specification).
signCount 4 This should always be 0000 for passkeys, just as it should be for authData.

Verifying The signature

The signature is what we need to verify that the user trying to log in has the private key. It is the result of the concatenation of the authenticatorData and clientDataHash (i.e., the SHA-256 version of clientDataJSON).

To verify with the public key, we need to also concatenate the authenticatorData and clientDataHash. If the verification returns true, we know that the user is who they say they are, and we can let them authenticate into the application.

Here’s an example of how this is calculated:

const clientDataHash = await crypto.subtle.digest(
  'SHA-256',
  assertion.clientDataJSON
);
// For concatBuffer see: <https://github.com/nealfennimore/passkeys/blob/main/src/utils.ts#L31>
const data = concatBuffer(
  assertion.authenticatorData,
  clientDataHash
);

// NOTE: the signature from the assertion is in ASN.1 DER encoding. To get it working with Web Crypto
//We need to transform it into r|s encoding, which is specific for ECDSA algorithms)
//
// For fromAsn1DERtoRSSignature see: <https://github.com/nealfennimore/passkeys/blob/main/src/crypto.ts#L60>'
const isVerified = await crypto.subtle.verify(
  { name: 'ECDSA', hash: 'SHA-256' },
  pubkey,
  fromAsn1DERtoRSSignature(signature, 256),
  data
);

Sending The Assertion Payload

Finally, we get to send a response to the server with the assertion for logging into the application.

type AssertionCredentialPayload = {
  kid: string;
  clientDataJSON: string;
  authenticatorData: string;
  signature: string;
};

const payload: AssertionCredentialPayload = {
  kid: credential.id,
  clientDataJSON: safeByteEncode(assertion.clientDataJSON),
  authenticatorData: safeByteEncode(assertion.authenticatorData),
  signature: safeByteEncode(assertion.signature),
};

To complete the assertion phase, we first look up the stored public key, kid.

Next, we verify the following:

  • clientDataJSON again to ensure the same challenge is used,
  • The origin is the same, and
  • That the type is webauthn.get.

The authenticatorData can be used to check the rpIdHash, flags, and the signCount one more time. Finally, we take the signature and ensure that the stored public key can be used to verify that the signature is valid.

At this point, if all went well, the server should have verified all the information and allowed you to access your account! Congrats — you logged in with passkeys!

No More Passwords?

Do passkeys mean the end of passwords? Probably not… at least for a while anyway. Passwords will live on. However, there’s hope that more and more of the industry will begin to use passkeys. You can already find it implemented in many of the applications you use every day.

Passkeys was not the only implementation to rely on cryptographic means of authentication. A notable example is SQRL (pronounced “squirrel”). The industry as a whole, however, has decided to move forth with passkeys.

Hopefully, this article demystified some of the internal workings of passkeys. The industry as a whole is going to be using passkeys more and more, so it’s important to at least get acclimated. With all the security gains that passkeys provide and the fact that it’s resistant to phishing attacks, we can at least be more at ease browsing the internet when using them.

A High-Level Overview Of Large Language Model Concepts, Use Cases, And Tools

Even though a simple online search turns up countless tutorials on using Artificial Intelligence (AI) for everything from generative art to making technical documentation easier to use, there’s still plenty of mystery around it. What goes inside an AI-powered tool like ChatGPT? How does Notion’s AI feature know how to summarize an article for me on the fly? Or how are a bunch of sites suddenly popping up that can aggregate news and auto-publish a slew of “new” articles from it?

It all can seem like a black box of mysterious, arcane technology that requires an advanced computer science degree to understand. What I want to show you, though, is how we can peek inside that box and see how everything is wired up.

Specifically, this article is about large language models (LLMs) and how they “imbue” AI-powered tools with intelligence for answering queries in diverse contexts. I have previously written tutorials on how to use an LLM to transcribe and evaluate the expressed sentiment of audio files. But I want to take a step back and look at another way around it that better demonstrates — and visualizes — how data flows through an AI-powered tool.

We will discuss LLM use cases, look at several new tools that abstract the process of modeling AI with LLM with visual workflows, and get our hands on one of them to see how it all works.

Large Language Models Overview

Forgoing technical terms, LLMs are vast sets of text data. When we integrate an LLM into an AI system, we enable the system to leverage the language knowledge and capabilities developed by the LLM through its own training. You might think of it as dumping a lifetime of knowledge into an empty brain, assigning that brain to a job, and putting it to work.

“Knowledge” is a convoluted term as it can be subjective and qualitative. We sometimes describe people as “book smart” or “street smart,” and they are both types of knowledge that are useful in different contexts. This is what artificial “intelligence” is created upon. AI is fed with data, and that is what it uses to frame its understanding of the world, whether it is text data for “speaking” back to us or visual data for generating “art” on demand.

Use Cases

As you may imagine (or have already experienced), the use cases of LLMs in AI are many and along a wide spectrum. And we’re only in the early days of figuring out what to make with LLMs and how to use them in our work. A few of the most common use cases include the following.

  • Chatbot
    LLMs play a crucial role in building chatbots for customer support, troubleshooting, and interactions, thereby ensuring smooth communications with users and delivering valuable assistance. Salesforce is a good example of a company offering this sort of service.
  • Sentiment Analysis
    LLMs can analyze text for emotions. Organizations use this to collect data, summarize feedback, and quickly identify areas for improvement. Grammarly’s “tone detector” is one such example, where AI is used to evaluate sentiment conveyed in content.
  • Content Moderation
    Content moderation is an important aspect of social media platforms, and LLMs come in handy. They can spot and remove offensive content, including hate speech, harassment, or inappropriate photos and videos, which is exactly what Hubspot’s AI-powered content moderation feature does.
  • Translation
    Thanks to impressive advancements in language models, translation has become highly accurate. One noteworthy example is Meta AI’s latest model, SeamlessM4T, which represents a big step forward in speech-to-speech and speech-to-text technology.
  • Email Filters
    LLMs can be used to automatically detect and block unwanted spam messages, keeping your inbox clean. When trained on large datasets of known spam emails, the models learn to identify suspicious links, phrases, and sender details. This allows them to distinguish legitimate messages from those trying to scam users or market illegal or fraudulent goods and services. Google has offered AI-based spam protection since 2019.
  • Writing Assistance
    Grammarly is the ultimate example of an AI-powered service that uses LLM to “learn” how you write in order to make writing suggestions. But this extends to other services as well, including Gmail’s “Smart Reply” feature. The same thing is true of Notion’s AI feature, which is capable of summarizing a page of content or meeting notes. Hemmingway’s app recently shipped a beta AI integration that corrects writing on the spot.
  • Code and Development
    This is the one that has many developers worried about AI coming after their jobs. It hit the commercial mainstream with GitHub Copilot, a service that performs automatic code completion. Same with Amazon’s CodeWhisperer. Then again, AI can be used to help sharpen development skills, which is the case of MDN’s AI Help feature.

Again, these are still the early days of LLM. We’re already beginning to see language models integrated into our lives, whether it’s in our writing, email, or customer service, among many other services that seem to pop up every week. This is an evolving space.

Types Of Models

There are all kinds of AI models tailored for different applications. You can scroll through Sapling’s large list of the most prominent commercial and open-source LLMs to get an idea of all the diverse models that are available and what they are used for. Each model is the context in which AI views the world.

Let’s look at some real-world examples of how LLMs are used for different use cases.

Natural Conversation
Chatbots need to master the art of conversation. Models like Anthropic’s Claude are trained on massive collections of conversational data to chat naturally on any topic. As a developer, you can tap into Claude’s conversational skills through an API to create interactive assistants.

Emotions
Developers can leverage powerful pre-trained models like Falcon for sentiment analysis. By fine-tuning Falcon on datasets with emotional labels, it can learn to accurately detect the sentiment in any text provided.

Translation
Meta AI released SeamlessM4T, an LLM trained on huge translated speech and text datasets. This multilingual model is groundbreaking because it translates speech from one language into another without an intermediary step between input and output. In other words, SeamlessM4T enables real-time voice conversations across languages.

Content Moderation
As a developer, you can integrate powerful moderation capabilities using OpenAI’s API, which includes a LLM trained thoroughly on flagging toxic content for the purpose of community moderation.

Spam Filtering
Some LLMs are used to develop AI programs capable of text classification tasks, such as spotting spam emails. As an email user, the simple act of flagging certain messages as spam further informs AI about what constitutes an unwanted email. After seeing plenty of examples, AI is capable of establishing patterns that allow it to block spam before it hits the inbox.

Not All Language Models Are Large

While we’re on the topic, it’s worth mentioning that not all language models are “large.” There are plenty of models with smaller sets of data that may not go as deep as ChatGPT 4 or 5 but are well-suited for personal or niche applications.

For example, check out the chat feature that Luke Wrobleski added to his site. He’s using a smaller language model, so the app at least knows how to form sentences, but is primarily trained on Luke’s archive of blog posts. Typing a prompt into the chat returns responses that read very much like Luke’s writings. Better yet, Luke’s virtual persona will admit when a topic is outside of the scope of its knowledge. An LLM would provide the assistant with too much general information and would likely try to answer any question, regardless of scope. Members from the University of Edinburgh and the Allen Institute for AI published a paper in January 2023 (PDF) that advocates the use of specialized language models for the purpose of more narrowly targeted tasks.

Low-Code Tools For LLM Development

So far, we’ve covered what an LLM is, common examples of how it can be used, and how different models influence the AI tools that integrate them. Let’s discuss that last bit about integration.

Many technologies require a steep learning curve. That’s especially true with emerging tools that might be introducing you to new technical concepts, as I would argue is the case with AI in general. While AI is not a new term and has been studied and developed over decades in various forms, its entrance to the mainstream is certainly new and sparks the recent buzz about it. There’s been plenty of recent buzz in the front-end development community, and many of us are scrambling to wrap our minds around it.

Thankfully, new resources can help abstract all of this for us. They can power an AI project you might be working on, but more importantly, they are useful for learning the concepts of LLM by removing advanced technical barriers. You might think of them as “low” and “no” code tools, like WordPress.com vs. self-hosted WordPress or a visual React editor that is integrated with your IDE.

Low-code platforms make it easier to leverage large language models without needing to handle all the coding and infrastructure yourself. Here are some top options:

Chainlit

Chainlit is an open-source Python package that is capable of building a ChatGPT-style interface using a visual editor.

LLMStack is another low-code platform for building AI apps and chatbots by leveraging large language models. Multiple models can be chained together into “pipelines” for channeling data. LLMStack supports standalone app development but also provides hosting that can be used to integrate an app into sites and products via API or connected to platforms like Slack or Discord.

LLMStack is also what powers Promptly, a cloud version of the app with freemium subscription pricing that includes a free tier.

FlowiseAI

Stack AI is another no-code offering for developing AI apps integrated with LLMs. It is much like FlowiseAI, particularly the drag-and-drop interface that visualizes connections between apps and APIs. One thing I particularly like about Stack AI is how it incorporates “data loaders” to fetch data from other platforms, like Slack or a Notion database.

I also like that Stack AI provides a wider range of LLM offerings. That said, it will cost you. While Stack AI offers a free pricing tier, it is restricted to a single project with only 100 runs per month. Bumping up to the first paid tier will set you back $199 per month, which I suppose is used toward the costs of accessing a wider range of LLM sources. For example, Flowise AI works with any LLM in the Hugging Face community. So does Stack AI, but it also gives you access to commercial LLM offerings, like Anthropic’s Claude models and Google’s PaLM, as well as additional open-source offerings from Replicate.

Voiceflow

Install FlowiseAI

First things first, we need to get FlowiseAI up and running. FlowiseAI is an open-source application that can be installed from the command line.

You can install it with the following command:

npm install -g flowise

Once installed, start up Flowise with this command:

npx flowise start

From here, you can access FlowiseAI in your browser at localhost:3000.

It’s possible to serve FlowiseAI so that you can access it online and provide access to others, which is well-covered in the documentation.

Setting Up Retrievers

Retrievers are templates that the multi-prompt chain will query.

Different retrievers provide different templates that query different things. In this case, we want to select the Prompt Retriever because it is designed to retrieve documents like PDF, TXT, and CSV files. Unlike other types of retrievers, the Prompt Retriever does not actually need to store those documents; it only needs to fetch them.

Let’s take the first step toward creating our career assistant by adding a Prompt Retriever to the FlowiseAI canvas. The “canvas” is the visual editing interface we’re using to cobble the app’s components together and see how everything connects.

Adding the Prompt Retriever requires us to first navigate to the Chatflow screen, which is actually the initial page when first accessing FlowiseAI following installation. Click the “Add New” button located in the top-right corner of the page. This opens up the canvas, which is initially empty.

The “Plus” (+) button is what we want to click to open up the library of items we can add to the canvas. Expand the Retrievers tab, then drag and drop the Prompt Retriever to the canvas.

The Prompt Retriever takes three inputs:

  1. Name: The name of the stored prompt;
  2. Description: A brief description of the prompt (i.e., its purpose);
  3. Prompt system message: The initial prompt message that provides context and instructions to the system.

Our career assistant will provide career suggestions, tool recommendations, salary information, and cities with matching jobs. We can start by configuring the Prompt Retriever for career suggestions. Here is placeholder content you can use if you are following along:

  • Name: Career Suggestion;
  • Description: Suggests careers based on skills and experience;
  • Prompt system message: You are a career advisor who helps users identify a career direction and upskilling opportunities. Be clear and concise in your recommendations.

Be sure to repeat this step three more times to create each of the following:

  • Tool recommendations,
  • Salary information,
  • Locations.

Adding A Multi-Prompt Chain

A Multi-Prompt Chain is a class that consists of two or more prompts that are connected together to establish a conversation-like interaction between the user and the career assistant.

The idea is that we combine the four prompts we’ve already added to the canvas and connect them to the proper tools (i.e., chat models) so that the career assistant can prompt the user for information and collect that information in order to process it and return the generated career advice. It’s sort of like a normal system prompt but with a conversational interaction.

The Multi-Prompt Chain node can be found in the “Chains” section of the same inserter we used to place the Prompt Retriever on the canvas.

Once the Multi-Prompt Chain node is added to the canvas, connect it to the prompt retrievers. This enables the chain to receive user responses and employ the most appropriate language model to generate responses.

To connect, click the tiny dot next to the “Prompt Retriever” label on the Multi-Prompt Chain and drag it to the “Prompt Retriever” dot on each Prompt Retriever to draw a line between the chain and each prompt retriever.

Integrating Chat Models

This is where we start interacting with LLMs. In this case, we will integrate Anthropic’s Claude chat model. Claude is a powerful LLM designed for tasks related to complex reasoning, creativity, thoughtful dialogue, coding, and detailed content creation. You can get a feel for Claude by registering for access to interact with it, similar to how you’ve played around with OpenAI’s ChatGPT.

From the inserter, open “Chat Models” and drag the ChatAnthropic option onto the canvas.

Once the ChatAnthropic chat model has been added to the canvas, connect its node to the Multi-Prompt Chain’s “Language Model” node to establish a connection.

It’s worth noting at this point that Claude requires an API key in order to access it. Sign up for an API key on the Anthropic website to create a new API key. Once you have an API key, provide it to the Mutli-Prompt Chain in the “Connect Credential” field.

Adding A Conversational Agent

The Agent component in FlowiseAI allows our assistant to do more tasks, like accessing the internet and sending emails.

It connects external services and APIs, making the assistant more versatile. For this project, we will use a Conversational Agent, which can be found in the inserter under “Agent” components.

Once the Conversational Agent has been added to the canvas, connect it to the Chat Model to “train” the model on how to respond to user queries.

Integrating Web Search Capabilities

The Conversational Agent requires additional tools and memory. For example, we want to enable the assistant to perform Google searches to obtain information it can use to generate career advice. The Serp API node can do that for us and is located under “Tools” in the inserter.

Like Claude, Serp API requires an API key to be added to the node. Register with the Serp API site to create an API key. Once the API is configured, connect Serp API to the Conversational Agent’s “Allowed Tools” node.

Building In Memory

The Memory component enables the career assistant to retain conversation information.

This way, the app remembers the conversation and can reference it during the interaction or even to inform future interactions.

There are different types of memory, of course. Several of the options in FlowiseAI require additional configurations, so for the sake of simplicity, we are going to add the Buffer Memory node to the canvas. It is the most general type of memory provided by LangChain, taking the raw input of the past conversation and storing it in a history parameter for reference.

Buffer Memory connects to the Conversational Agent’s “Memory” node.

The Final Workflow

At this point, our workflow looks something like this:

  • Four prompt retrievers that provide the prompt templates for the app to converse with the user.
  • A multi-prompt chain connected to each of the four prompt retrievers that chooses the appropriate tools and language models based on the user interaction.
  • The Claude language model connected to the multi-chain prompt to “train” the app.
  • A conversational agent connected to the Claude language model to allow the app to perform additional tasks, such as Google web searches.
  • Serp API connected to the conversational agent to perform bespoke web searches.
  • Buffer memory connected to the conversational agent to store, i.e., “remember,” conversations.

If you haven’t done so already, this is a great time to save the project and give it a name like “Career Assistant.”

Final Demo

Watch the following video for a quick demonstration of the final workflow we created together in FlowiseAI. The prompts lag a little bit, but you should get the idea of how all of the components we connected are working together to provide responses.

Conclusion

As we wrap up this article, I hope that you’re more familiar with the concepts, use cases, and tools of large language models. LLMs are a key component of AI because they are the “brains” of the application, providing the lens through which the app understands how to interact with and respond to human input.

We looked at a wide variety of use cases for LLMs in an AI context, from chatbots and language translations to writing assistance and summarizing large blocks of text. Then, we demonstrated how LLMs fit into an AI application by using FlowiseAI to create a visual workflow. That workflow not only provided a visual of how an LLM, like Claude, informs a conversation but also how it relies on additional tools, such as APIs, for performing tasks as well as memory for storing conversations.

The career assistant tool we developed together in FlowiseAI was a detailed visual look inside the black box of AI, providing us with a map of the components that feed the app and how they all work together.

Now that you know the role that LLMs play in AI, what sort of models would you use? Is there a particular app idea you have where a specific language model would be used to train it?

References

The Ultimate Guide to API vs. SDK: What’s the Difference and How To Use Them?

First, let’s figure out the terms we are using and why API vs SDK are even paired together.

What Is an API?

API (an acronym for Application Programming Interface) is an interface that enables intercommunication between two applications. It includes a standardized set of rules that define how this interaction can undergo, i.e., what kind of information to exchange, what actions to carry out, etc.

Monitor API Health Check With Prometheus

APISIX has a health check mechanism which proactively checks the health status of the upstream nodes in your system. Also, APISIX integrates with Prometheus through its plugin that exposes upstream nodes (multiple instances of a backend API service that APISIX manages) health check metrics on the Prometheus metrics endpoint typically, on URL path /apisix/prometheus/metrics.

In this article, we'll guide you on how to enable and monitor API health checks using APISIX and Prometheus.

API’s Role in Digital Government, 10 National Best Practices

As the digital revolution reshapes government operations worldwide, Application Programming Interfaces (APIs) have emerged as a critical tool in driving digital transformation. Through APIs, governments can ensure smoother interoperability between various systems, facilitate data sharing, and innovate public services. Here, we look at 10 best practices for using APIs in digital government based on national examples from around the globe.

APIs are instrumental in public digital service provision for their connective nature”. According to the publication: Application Programming Interfaces in Governments: Why, what, and How.

Finding Enabled PHP Functions In Your WordPress Hosting Using phpinfo()

WordPress runs on PHP, so as a WordPress developer, it’s important to understand the PHP functions enabled on the server that hosts your site(s).

Do you need to set up a plugin or configure an application on a WordPress site and are wondering if a certain PHP function or library is enabled on your server (e.g. cURL)?

In this tutorial, we’ll show you a quick and easy way to find enabled PHP functions on your server using the phpinfo() function in WordPress. We also provide a comprehensive glossary of these PHP functions for reference, and to help you better understand the backend of your WordPress sites.

This quick tutorial covers the following:

What is phpinfo()?

The phpinfo() function is a built-in PHP function that provides a long list of detailed information about the PHP installation and configuration settings on your server, including all the loaded extensions.

When phpinfo() is called and executed, it generates a comprehensive HTML page that displays various aspects of the PHP environment, including PHP version, extensions, directives, environment variables, and more.

The phpinfo() function outputs information in a tabular format, making it easy to navigate and understand the PHP configuration. This knowledge allows you to leverage the available functions on your hosting environment and optimize your WordPress development process.

The information displayed by phpinfo() can be categorized into different sections that provide specific details about a particular aspect of the PHP environment.

Some common information you can find using phpinfo() includes:

1. PHP version: The version of PHP running on the server.
2. Configuration settings: Various settings and directives defined in the PHP configuration file (php.ini).
3. Extensions: A list of loaded PHP extensions and their configurations.
4. Environment variables: Server environment variables and their values.
5. PHP variables: Information about predefined PHP variables, such as `$_SERVER`, `$_GET`, `$_POST`, etc.
6. HTTP headers: HTTP request and response headers.

For a list of all PHP functions enabled on your server, see the tutorial below.

Notes:

  • Use phpinfo() with caution. While it provides valuable information for development and troubleshooting purposes, it should not be left accessible on a production server. After obtaining the necessary information, we recommend removing or commenting out the phpinfo() function call for security purposes.
  • If you only need to know which version of PHP your server is currently running, you can skip the tutorial below and simply go to the Hosting > Overview tab in The Hub.
The Hub - Hosting Overview screen - PHP version information.
Check which version of PHP your server is running from The Hub

How to Find Enabled PHP Functions On Your Server Using phpinfo()

By following the steps outlined in this tutorial, you will learn how to easily retrieve a list of information showing all the enabled PHP functions and extensions on your server.

For this tutorial, we’ll show you how to access the list of PHP functions for a WordPress site set up on WPMU DEV hosting. Note that different hosting environments may use different tools and methods to display this information. Reach out to your hosting support if you have any questions or need help.

Step 1: Access your WordPress site’s files

To begin, you need to be able to access the WordPress site’s files stored on your server. You can do this either via FTP or using our File Manager tool.

Step 2: Create a PHP file

Next, create a PHP file using a text editor and add the phpinfo() function shown below:

<?php
phpinfo();

You can name this PHP file anything you like. In the example below, we’ve named the file ‘info.php’ (note: avoid using an existing filename found on the root folder of your WordPress installation to prevent overwriting the original file).

PHP file
Create a PHP file to call the phpinfo() function.

Save your PHP file and close your text editor.

Step 3: Upload the file to your server

Locate the root directory of your WordPress installation, where the main files like wp-config.php and index.php are located, and upload your file to this folder.

As mentioned earlier, you can do this easily using our File Manager tool.

File Manager
Upload the file to the WordPress install root directory.

Step 4: Access the phpinfo() output

Open your WordPress site in a web browser and enter the URL of the uploaded PHP file to generate a PHP function report.

You should see the PHP information displayed. The output will contain detailed information about the PHP configuration, including all enabled functions on your server.

PHP function report.
PHP function report.

Step 5: Locate the enabled PHP functions

Scroll down the phpinfo() output to find a specific function. Typically, you will find a list of all enabled PHP functions along with their respective settings and configurations in the section labeled “Core.”

That’s all there is to it!

Refer to the Glossary section below if you need to look up any of the functions listed in your generated PHP function report .

Glossary of PHP Functions

This glossary provides a list of various PHP functions and their applications. Feel free to bookmark this page and use it as a quick reference guide to better understand the backend of your WordPress sites.

Configuration

This function deals with setting up PHP to work with the Internet server and to define settings within your PHP scripts.

  • bcmath – This module enables arbitrary precision mathematics in PHP.
  • calendar – This function of PHP allows conversions between various calendar formats.
  • cgi-fcgi – Command for PHP when run in CGI or FastCGI mode.

Core

These are basic PHP functions and classes that form the core of the PHP language.

  • ctype – A library of PHP that checks if the data type of a variable is a valid character type.
  • curl – Used for transferring data with URLs and is the backbone of multiple functions in PHP.
  • date – A group of functions that let you retrieve or format the local or GMT date and time in PHP.
  • dom – A PHP extension that provides a robust, powerful DOM (Document Object Model) XML API.
  • exif – PHP function used to work with image metadata.
  • FFI – Foreign Function Interface is an extension that provides a simple way to call native functions, access native variables, and create/access data structures defined in C libraries.
  • fileinfo – A PHP extension that helps you to identify a file’s mime type.
  • filter – This function filters data by either validating or sanitizing it which aids in securing a PHP application.
  • ftp – FTP PHP functions help establish a connection to a remote FTP server, a crucial part of file sharing.
  • gd – A library used for dynamic image creation.
  • gettext – An extension aimed at the internationalization of PHP scripts by providing translation support.
  • gmp – This is a PHP extension for arbitrary precision mathematics.
  • hash – This function is used to generate a hash value from a string.
  • iconv – Provides an interface to the GNU iconv library, which provides conversion of character sets.
  • igbinary – An alternative to PHP serializer with better performance and smaller size.
  • imagick – A PHP extension that allows working with ImageMagick, a robust software suite to create, edit, and compose images.
  • imap – This function provides an API for talking to the internet mail servers using PHP.
  • intl – This extension helps to perform UCA-conformant collation and date/time/number/currency formatting in PHP.
  • json – JSON functions in PHP allows for encoding and decoding JSON data.
  • ldap – LDAP functions connect, bind and disconnect from an LDAP directory.
  • libxml – A foundation library that offers a set of APIs for manipulating XML, including parsing XML documents and support for other document types like HTML.
  • mbstring – A non-binary string handling extension that provides multibyte specific string functions.
  • mcrypt – Provides a variety of encryption functions.
  • memcache – Memcache module provides handy procedural and object-oriented interface to memcached, high-performance, distributed memory object caching system, generic in nature but intended for use in speeding up dynamic web applications by alleviating database load.
  • memcached – An extension for interfacing with memcached via libmemcached library.
  • msgpack – Provides an interface to msgpack.org, which is a binary-based efficient object serialization library.
  • mysqli – A database driver used to interact with MySQL databases.
  • mysqlnd – It’s the MySQL native driver for PHP.
  • openssl – A robust PHP function used for generating and verifying digital signatures.
  • pcre – Provides functions for ‘perl-compatible regular expressions’.
  • PDO – PHP Data Objects is a database access layer providing a uniform method of access to multiple databases.
  • pdo_mysql – A driver that implements the PHP Data Object (PDO) interface to enable access to MySQL databases.
  • Phar – An archive format combined with a runtime library to help build and load PHP applications bundled into a single file.
  • posix – Accessors to the POSIX (Unix) system calls.
  • readline – Provides an interactive line editing capabilities and history functions.
  • redis – A PHP extension for interfacing with Redis, a high performance key-value storage service.
  • Reflection – A PHP extension, allows inspection and reverse-engineering of PHP programs using a process called “reflection”.
  • session – This function enables user session management.
  • shmop – A simple interface for accessing shared memory segments in PHP.
  • SimpleXML – An extension that simplifies the work of reading XML files.
  • soap – SoapClient is a PHP built-in class providing methods for sending SOAP requests and receiving SOAP responses from a URL.
  • sockets – PHP socket functions let you create and manage network sockets, low-level network communications between servers.
  • sodium – Sodium is a modern, easy-to-use software library for encryption, decryption, signatures, password hashing and more.
  • SPL – The Standard PHP Library (SPL) is a collection of interfaces and classes that are part of PHP core.
  • standard – These are built-in functions and classes provided by PHP which do not rely on external dependencies or PHP extensions.
  • sysvmsg – It provides an interface to System V message queues.
  • sysvsem – Offers access to POSIX-style semaphores.
  • sysvshm – Provides shared memory functions.
  • tokenizer – The tokenizer functions provide an API that allows converting PHP source code into an array of tokens.
  • xml – XML (eXtensible Markup Language) Parser functions let you parse XML documents.
  • xmlreader – An extension that pulls data in and pushes it back out again.
  • xmlrpc – PHP implementation of XML-RPC protocol used in exchanging data across a network.
  • xmlwriter – An extension to create XML documents using a simple ‘constructor-like’ style.
  • xsl – XSL is a language for expressing style sheets to transform XML documents into other XML documents.
  • Zend OPcache – An open-source component that improves PHP performance by storing pre-compiled script bytecode in shared memory, thereby removing the need for PHP to load and parse scripts on each request.
  • zip – PHP zip extension is used to read, write, and manipulate zip archives.
  • zlib – Provides access to zlib compression library.

Additional Modules

These are the underlying software components or libraries that add various functionalities to the PHP scripting language.

Environment

Involved in setting up, configuring, and managing the PHP runtime environment.

PHP Variables

These are used to store data which can be modified during the execution of your script.

DIY or Use Our Support Team

Learning PHP will enhance your ability to troubleshoot, modify and optimize your WordPress site.

Hopefully, this tutorial will come in handy if you need to do a quick look up of your server’s enabled PHP functions. Of course, if your WordPress sites are hosted with WPMU DEV, you can always reach out to our 24/7 support team for expert assistance on anything WordPress and hosting related, or get instant answers with our AI-powered assistant.

And rest assured, our hosting is configured to work with just about all WordPress plugins, applications, and configurations.

API and Database Performance Optimization Strategies

In today's fast-paced digital landscape, performance optimization plays a pivotal role in ensuring the success of applications that rely on the integration of APIs and databases. Efficient and responsive API and database integration is vital for achieving high-performing applications. Poorly optimized performance can lead to sluggish response times, scalability challenges, and even user dissatisfaction.

By focusing on performance optimization, developers can enhance the speed, scalability, and overall efficiency of their applications. This not only leads to improved user satisfaction but also lays the foundation for future growth and success. Investing time and effort in optimizing performance upfront can prevent performance bottlenecks, reduce maintenance costs, and provide a solid framework for handling increased user loads as the application scales.

The UX Of Flight Searches: How We Challenged Industry Standards

The topic of flight search has been on our workbench before. Back in 2015, part of us worked on the design strategy of Lufthansa Group. In 2017, airberlin became one of our first clients. Together with the team, we redesigned their digital world from scratch: flight search, booking process, homepage, and much more.

What was considered too progressive in 2016 celebrated its first successes in 2017. Six years later, in 2023, it is now being expanded as a case study by DUMBO.

Note: This is a fictitious case study undertaken on our own initiative and was neither developed nor launched. With this study, we want to question habits, break down barriers and offer new food for thought to improve interactions.

A Flight Search Observation

If you, like most, have searched for a flight at some point, you are familiar with the usual song and dance involved with playing with the search criteria in order to score an optimal search result. If I change the travel date, will it be cheaper? If I depart from a different airport, will the flights be less-full? As a result, the hunt is a never-ending combination of viewing results, making further refinements, and constantly changing the search criteria. So, do you see yourself here? Twenty-nine students who took part in the study “Online Search Behaviour in the Air Travel Market: Reconsidering the Consideration Set and Customer Journey Concepts” certainly did.

According to the 2017 study, flight search algorithms cover a dynamic solution space with often more than a thousand possibilities that can change rapidly, cutting it down based on user criteria. Users, on the other hand, go through three rounds of refinement on average, filtering and refining search criteria, comparing possibilities, and making trade-off judgments. The analyzed flight searches are insufficient to support the final decision: users must still make judgments and trade-offs depending on their personal preferences and priorities. The study emphasizes the importance of improved interfaces with decision support up to the final decision in order to improve the flight search experience.

Frontstage: What We Can See

When we observe people booking flights, we notice unpleasant side effects and interesting user hacks.

  • Searching for the right flight is extremely stressful.
    High prices, limited availability, artificial scarcity, a plenitude of options, as well as an ingrained penchant for cost traps and loopholes.
  • The flight gets more expensive with every search.
    Opaque pricing and the feeling of being on the airline’s hook make travelers suspicious of cookies and tracking.
  • Flights are like looking for a needle in a haystack.
    Alone, the route from Frankfurt to Honolulu offers 8,777 different flight combinations. To get a handle on what’s on offer, travelers turn to third-party providers like Swoodoo to combine different routes or Google to find offers from the surrounding area and many more.
  • Long waiting times are nerve-wracking.
    Prices are recalculated, and availability is checked for every search query. In our test, a query usually takes 10 seconds. This always leads to long waiting times in the observed search behavior.
  • The quest for the best flight deal.
    The most important decision criterion for a flight is still the price. But every search parameter influences it. The lack of price communication reinforces the feeling of intransparency.
  • The feeling of having paid too much for the flight.
    When it comes to flights, most travelers are confronted with “from” prices. However, these are only available on certain flights and in limited numbers. What if such flights are not available? This leads to negative anchoring: what seemed affordable at the beginning now seems all the more expensive.
Backstage: What We Don’t See

It takes a look behind the curtain to find out that there are numerous technical and business constraints that have an enormous impact on flight search. Rather than years of usability engineering, the search experience is largely determined by third-party booking systems, dynamic pricing, and cost-per-request mechanics.

  • 3rd Party Booking System.
    Behind most flight searches is a reservation system running called Amadeus. This is where millions of customers purchase their tickets. Amadeus is mostly responsible for which data points are available and how the interface is designed. Airlines use those systems and can only exert limited influence on a better solution.
  • Dynamic Pricing.
    Dynamic pricing is used to set the price of a product based on current market conditions. Prices fluctuate in real-time based on current data. This includes data on customer booking behavior, competitor airline prices, popular events, and a variety of other factors that affect product demand and necessitate price adjustments.
  • Cost per request.
    In most cases, searches are charged per request. To keep costs down, airlines want to reduce search requests. This leads to avoiding both pre-emptive and iterative queries.
Reframing The Problem

The classic flight search pattern inevitably leads to a frustrating trial-and-error loop.

Flight searches are structured in such a way that it is highly unlikely that a customer can find a suitable flight straight away because it presupposes that the traveler has entered all price-relevant information before submitting the search query.

The dilemma: this price-relevant information affects availability, travel time, and service. At the same time, they are factors for the traveler that can be changed depending on the result and personal preferences and flexibility. As a result, travelers develop their own user hacks to compare different search parameters and weigh the trade-off between price and convenience.

How can we give travelers a better flight search experience? Our pitch is The Balancing Act: a guided dialogue between traveler and airline. Strap in — we’re taking a deep dive.

The Flight Search Redesign: Introducing The “Balancing Act”

What makes a search successful? It’s an increasingly important question in the age of global travel and its limitless possibilities. We focus on finding your personal solution. It puts the traveler, their occasion, and their budget at the center of the interaction and looks at how well the flight offer fits. To do this, we fundamentally change the tailoring of the interaction with travelers. We break down the search form into individual tasks and change the sequence of interactions. This allows a more balanced approach between friction and progress.

We Will Take You There. But Where To?

Let’s start the flight search with the only question whose answer is not up for discussion: Where to? Knowing where you want to go, we might be able to help you to weigh up every further detail in terms of cost and convenience. This will allow you to make conscious decisions.

DUMBO (2023). Enter flight search by entering the destination. [Design Mockup] (Large preview)

Find The Perfect Connection

We will find the best departure point for you. Depending on where you want to go, where you plan to stay, and at what prices and conditions, we might be able to offer alternative routes that are easy on your wallet and get you to your destination comfortably and quickly.

DUMBO (2023). Origin airport selection, including alternatives. [Design Mockup] (Large preview)

Times That Suit You

Airplanes are almost always in the air, but they are not always the same. For some journeys, you are time-bound; for others — not. Best-price calendars, travel times, expected load factors (and much more) might help you to find the best flight for your journey.

DUMBO (2023). The date picker includes different views to highlight data according to personal preferences. [Design Mockup] (Large preview)

Without Getting In Your Way

We will react as quickly as possible. Even before we talk about the number of passengers, deposit access codes, or create multi-stop flights, you should have an idea of whether there is a suitable flight for you.

DUMBO (2023). Flight plan with prices being subsequently loaded. [Design Mockup] (Large preview) This Is How We Get There: Step By Step

To redesign the interface, we need to uncover the structure of the interaction moment. For this, we use the Interaction Archetypes framework to help us align our design with the underlying usage intention — the strongest driver for user interaction.

Task

The task is to find a suitable flight. We see that this usually takes several attempts and is achieved with the help of different search platforms and flight brokers. This shows that we are clearly in a weighing phase when searching for a flight. Different flights, routes, and times are weighed against travel planning criteria as well as personal preferences and limiting factors of the traveler.

Intention Of Use

The intention of use is a key determinant of interaction. The better we tailor our interface to the intention of use, the higher the probability that the interaction will be successful. Research findings show that usage intentions for digital applications can be assigned to three categories: “Act,” “Understand,” and “Explore.”

In our case, we can clearly attribute the flight search to the “Act” usage intention: users have a specific task and want to make progress in completing that task as quickly as possible. Flight search is characterized by a clear goal. Travelers want to get an overview of the available flights to find the best option for their specific solution space. They take a structured approach and selectively change search parameters to uncover inconsistencies and explore the limits of what is available.

Success

Changing various parameters shows that the solution space for this task is multi-dimensional. And not just that, on closer inspection, it becomes clear that a flight search is a hierarchical step process: a so-called “Analytic Hierarchy Process.” We assume that decision-making tasks are sequential. The traveler works his way from decision level to decision level. All levels of the flight search are causally related.

Goal

Flight search is inextricably linked to flight booking, which in turn is linked to travel to and from the destination. The primary goal of travelers is “to arrive.” Here, we observe the same causal relationship that we have already seen with the success factors. We are also dealing with a hierarchical step process.

This means that before they start looking for a flight, travelers have already considered the destination, the time of travel, the duration of the trip, and the travel costs. Travelers, therefore, usually have a kind of hidden agenda, which they consciously or subconsciously review in the course of their flight search.

Hypotheses

If we consider the problem and the context in which the interface is used, three hypotheses emerge. They open up a solution space for the flight search:

  • If we design the search along the decision levels, travelers can make faster and more confident decisions at each stage of decision-making.
  • If travelers can already weigh their options in terms of price and convenience at the moment of entry, the first search results are likely to be suitable, thereby reducing the re-submission of search queries.
  • If we show partial information as soon as it is available, travelers can quickly scan for suitable flight results, thereby reducing friction and the abandonment rate.
The Challenge

What we currently see in the airline and flight industry space is that search now assumes that its parameters meet fixed criteria. Accordingly, a “successful” search is given if it is able to deliver a result based on the ten declarations. In interaction, however, travelers behave in a way that contradicts this assumption.

The decision-making levels through which travelers approach their destination provide information about it. They are hierarchical and causally related.

Each individual decision is the result of a trade-off between price and convenience. A successful search is, therefore, the smallest compromise.

So if we create space for trade-offs through interaction, we should be able to make the flight search more targeted to the traveler’s needs. This raises three major design opportunities:

  1. How might we utilize travelers’ decision-making levels to speed up the process?
  2. How might we help travelers balance price and convenience to reduce search queries?
  3. How might we deliver results to travelers faster to reduce friction and abandonment rates?
Solutions

From Static To Sequential

We say goodbye to the predominant route indication of a flight search and ask in the first step: Where do you want to travel to? We quickly realized that the “where to” question fits the mental model of travelers and can serve as a springboard for goal-oriented interaction. Only if we bring travelers closer to their destination can the airline make a relevant offer.

Four steps

But that’s not all: We completely remove the search form and lead travelers to their flight in a dialogue. Following the decision-making levels, we ask for four pieces of information one after the other, on the basis of which we can generate a suitable flight plan:

  1. Specify the destination.
  2. Specify the origin.
  3. Select the departure time.
  4. Select the return time.

Four Moments Of Success

Each entry is given our full attention. This reduces the cognitive load and creates space for content, even on small devices. This, in turn, is only possible if the effort per input is less for the user than the added value generated in each case.

If we orchestrate this information along the decision-making levels of travelers and understand their causality, we can consciously bring about partial decisions. Meanwhile, on the way to the individual solution space, we create four moments of success:

  1. Can we fly to our destination? Check!
  2. Can we fly from a suitable departure point? Check!
  3. Can we fly out at the right time? Check!
  4. Can we fly back at the right time? Check!

More Later

Further queries are refrained from in favor of the offer. All other criteria can be used to adjust the results while maintaining the flight schedule. These criteria are preselected based on the most frequent flight bookings or personal flight behavior.

  • Number of adults,
  • Number of children,
  • Number of infants,
  • Access codes to selected flights,
  • Selection of class.

From Passive To Proactive

To make well-informed decisions, travelers need to be aware of the consequences of their choices in the flight booking process. This means they need to understand the impact of their partial decision (date, departure location, airport) on the expected outcome. The better they can do this, the easier it is for them to weigh up. Ultimately, the best flight is the result of a personal trade-off between convenience and cost.

The Best Departure Point For You

If you live in western Germany, there are five possible departure points within a 90-minute radius. Frankfurt and Düsseldorf are two major hubs among them. So the departure airport is extremely flexible and raises questions:

  • Which departure airport comes into question?
  • Which airline is preferred?
  • What is an acceptable price range?
  • How mobile is one on the way to the airport?

Based on geolocation and the route network, conclusions can be drawn about a suitable departure airport depending on the destination. To do this, we look at nearby airports and rate them according to comfort and price. In addition, the travel time and the airline also play a role.

And that’s not all. We could place targeted offers, which could allow the airline to drive competition or control the load factor across the organization. In this way, attractive incentives can be created with the help of discounts, therefore positively influencing the actions of travelers.

The Best Time To Fly For You

The travel period is probably the most obscure and yet most important parameter for travelers, yet it is also the most essential factor in determining airfare and availability. A single day earlier or later can quickly add up to several hundred euros. This can have a critical impact on travel planning.

We don’t want travelers to have to correct their search later, so we add additional indicators to the date selection. First and foremost, there is a price display that is broken down daily for outbound and return flights.

Travel planning does not always leave room for maneuver. Therefore, early indicators of availability are all the more important. For this purpose, we mark days on which the destination is not served as well as days with particularly high load factors. In this way, we can set impulses at an early stage of travel planning to avoid negative booking experiences.

The Best Ticket For You?

A “One-Way Flight” can be more expensive than a “Return Flight.” We consider the option “One-Way Flight” within the date selection. This is because it is an alternative to a return flight. And the associated price is an important piece of information to consider when travelers are weighing options.

Even before the flight plan has been loaded, we put all options on the table. This is how we offer maximum price transparency.

Disclaimer: Multi-stop flights were not considered in this case study.

From Accurate To Instant

If we communicate flights and their prices prior to checking whether the flight is not yet fully booked or the prices have changed, there is a risk that the offer will have to be corrected. Usually, all of us want to make statements that can be fulfilled. But it needs a tolerance for errors in communication in order to provide volatile flight data as quickly as possible.

The following example: A flight is supposed to have a price of 300 euros, or so it was the day before. In the meantime, the prices have changed, and the flight costs 305 euros. As a result, the assumption based on the information was wrong and had to be corrected to the disadvantage of the customer.

Stupid. But: One was already in a situation to give a price indication. After all, the flight before and after might cost 600 euros and is therefore even more irrelevant than a flight for 305 euros if one had assumed 300 euros.

The communication error is less important than the added value at the moment of interaction. We can only overcome technical and business constraints with the help of estimates and assumptions.

Caching

In order to achieve price transparency, we have to refrain from requesting price calculations. Due to the costs per request and the loading times, it is not possible for us to communicate prices as they currently are. Therefore, we have to cache prices from previous searches, at least until a flight selection can be made.

This could also mean that we know that prices may change once the final flights are selected. The requirement for accurate price communication is sacrificed in favor of relevant selection criteria and fast landing times. After all, price is typically the most important factor in weighing any partial decision.

Flight Plan

To speed up the interaction, we need to put the availability check at the end. The route network has been determined; the flight plan has been drawn up. With the route information and travel times, we should have the corresponding flight plan immediately available. The availability check can be either downstream or simultaneous. In this way, we enable systems to communicate without being a hindrance to travelers.

Geolocation

Geolocation data can be used to draw conclusions about the departure airport. We do not necessarily have to use the geolocation API for this. It should also be possible to achieve sufficient localization with the help of IP address search so that we can immediately create added value. Once we have identified the airports in the vicinity, we can evaluate potential connections in terms of cost and convenience.

Overcoming Limits

Anyone who has ever had to search for and book a flight surely knows: it is nerve-wracking and time-consuming. Before you can even start the search, you have to enter ten details in the search form. This means that in the very first stage of the search, you’ve already had to make ten decisions. Unfortunately, and often, only one thing is certain: the destination of the journey.

Along our thought process, we have shown that the classic flight search pattern is broken, often because external factors such as technical and business constraints influence the flight search experience. However, we have shown that airlines and searches can break the pattern. This can be achieved by entering into a dialogue with the user and leading them from one decision level to another to eventually fit their specific needs and goals.

If you found this approach useful or interesting, I recommend our guide to developing your own Interaction Blueprint. It is based on our “Interaction Archetypes” framework that allows you to strategically illuminate a user’s behavioral patterns, as well as their interactions with digital interfaces. It has greatly transformed and improved our design process. We hope that it could transform your design process, too.

Further Reading On SmashingMag

Using AI To Detect Sentiment In Audio Files

I don’t know if you’ve ever used Grammarly’s service for writing and editing content. But if you have, then you no doubt have seen the feature that detects the tone of your writing.

It’s an extremely helpful tool! It can be hard to know how something you write might be perceived by others, and this can help affirm or correct you. Sure, it’s some algorithm doing the work, and we know that not all AI-driven stuff is perfectly accurate. But as a gut check, it’s really useful.

Now imagine being able to do the same thing with audio files. How neat would it be to understand the underlying sentiments captured in audio recordings? Podcasters especially could stand to benefit from a tool like that, not to mention customer service teams and many other fields.

An audio sentiment analysis has the potential to transform the way we interact with data.

That’s what we are going to accomplish in this article.

The idea is fairly straightforward:

  • Upload an audio file.
  • Convert the content from speech to text.
  • Generate a score that indicates the type of sentiment it communicates.

But how do we actually build an interface that does all that? I’m going to introduce you to three tools and show how they work together to create an audio sentiment analyzer.

But First: Why Audio Sentiment Analysis?

By harnessing the capabilities of an audio sentiment analysis tool, developers and data professionals can uncover valuable insights from audio recordings, revolutionizing the way we interpret emotions and sentiments in the digital age. Customer service, for example, is crucial for businesses aiming to deliver personable experiences. We can surpass the limitations of text-based analysis to get a better idea of the feelings communicated by verbal exchanges in a variety of settings, including:

  • Call centers
    Call center agents can gain real-time insights into customer sentiment, enabling them to provide personalized and empathetic support.
  • Voice assistants
    Companies can improve their natural language processing algorithms to deliver more accurate responses to customer questions.
  • Surveys
    Organizations can gain valuable insights and understand customer satisfaction levels, identify areas of improvement, and make data-driven decisions to enhance overall customer experience.

And that is just the tip of the iceberg for one industry. Audio sentiment analysis offers valuable insights across various industries. Consider healthcare as another example. Audio analysis could enhance patient care and improve doctor-patient interactions. Healthcare providers can gain a deeper understanding of patient feedback, identify areas for improvement, and optimize the overall patient experience.

Market research is another area that could benefit from audio analysis. Researchers can leverage sentiments to gain valuable insights into a target audience’s reactions that could be used in everything from competitor analyses to brand refreshes with the use of audio speech data from interviews, focus groups, or even social media interactions where audio is used.

I can also see audio analysis being used in the design process. Like, instead of asking stakeholders to write responses, how about asking them to record their verbal reactions and running those through an audio analysis tool? The possibilities are endless!

The Technical Foundations Of Audio Sentiment Analysis

Let’s explore the technical foundations that underpin audio sentiment analysis. We will delve into machine learning for natural language processing (NLP) tasks and look into Streamlit as a web application framework. These essential components lay the groundwork for the audio analyzer we’re making.

Natural Language Processing

In our project, we leverage the Hugging Face Transformers library, a crucial component of our development toolkit. Developed by Hugging Face, the Transformers library equips developers with a vast collection of pre-trained models and advanced techniques, enabling them to extract valuable insights from audio data.

With Transformers, we can supply our audio analyzer with the ability to classify text, recognize named entities, answer questions, summarize text, translate, and generate text. Most notably, it also provides speech recognition and audio classification capabilities. Basically, we get an API that taps into pre-trained models so that our AI tool has a starting point rather than us having to train it ourselves.

UI Framework And Deployments

Streamlit is a web framework that simplifies the process of building interactive data applications. What I like about it is that it provides a set of predefined components that works well in the command line with the rest of the tools we’re using for the audio analyzer, not to mention we can deploy directly to their service to preview our work. It’s not required, as there may be other frameworks you are more familiar with.

Building The App

Now that we’ve established the two core components of our technical foundation, we will next explore implementation, such as

  1. Setting up the development environment,
  2. Performing sentiment analysis,
  3. Integrating speech recognition,
  4. Building the user interface, and
  5. Deploying the app.

Initial Setup

We begin by importing the libraries we need:

import os
import traceback
import streamlit as st
import speech_recognition as sr
from transformers import pipeline

We import os for system operations, traceback for error handling, streamlit (st) as our UI framework and for deployments, speech_recognition (sr) for audio transcription, and pipeline from Transformers to perform sentiment analysis using pre-trained models.

The project folder can be a pretty simple single directory with the following files:

  • app.py: The main script file for the Streamlit application.
  • requirements.txt: File specifying project dependencies.
  • README.md: Documentation file providing an overview of the project.

Creating The User Interface

Next, we set up the layout, courtesy of Streamlit’s framework. We can create a spacious UI by calling a wide layout:

st.set_page_config(layout="wide")

This ensures that the user interface provides ample space for displaying results and interacting with the tool.

Now let’s add some elements to the page using Streamlit’s functions. We can add a title and write some text:

// app.py
st.title("🎧 Audio Analysis 📝")
st.write("[Joas](https://huggingface.co/Pontonkid)")

I’d like to add a sidebar to the layout that can hold a description of the app as well as the form control for uploading an audio file. We’ll use the main area of the layout to display the audio transcription and sentiment score.

Here’s how we add a sidebar with Streamlit:

// app.py
st.sidebar.title("Audio Analysis")
st.sidebar.write("The Audio Analysis app is a powerful tool that allows you to analyze audio files and gain valuable insights from them. It combines speech recognition and sentiment analysis techniques to transcribe the audio and determine the sentiment expressed within it.")

And here’s how we add the form control for uploading an audio file:

// app.py
st.sidebar.header("Upload Audio")
audio_file = st.sidebar.file_uploader("Browse", type=["wav"])
upload_button = st.sidebar.button("Upload")

Notice that I’ve set up the file_uploader() so it only accepts WAV audio files. That’s just a preference, and you can specify the exact types of files you want to support. Also, notice how I added an Upload button to initiate the upload process.

Analyzing Audio Files

Here’s the fun part, where we get to extract text from an audio file, analyze it, and calculate a score that measures the sentiment level of what is said in the audio.

The plan is the following:

  1. Configure the tool to utilize a pre-trained NLP model fetched from the Hugging Face models hub.
  2. Integrate Transformers’ pipeline to perform sentiment analysis on the transcribed text.
  3. Print the transcribed text.
  4. Return a score based on the analysis of the text.

In the first step, we configure the tool to leverage a pre-trained model:

// app.py
def perform_sentiment_analysis(text):
  model_name = "distilbert-base-uncased-finetuned-sst-2-english"

This points to a model in the hub called DistilBERT. I like it because it’s focused on text classification and is pretty lightweight compared to some other models, making it ideal for a tutorial like this. But there are plenty of other models available in Transformers out there to consider.

Now we integrate the pipeline() function that does the sentiment analysis:

// app.py
def perform_sentiment_analysis(text):
  model_name = "distilbert-base-uncased-finetuned-sst-2-english"
  sentiment_analysis = pipeline("sentiment-analysis", model=model_name)

We’ve set that up to perform a sentiment analysis based on the DistilBERT model we’re using.

Next up, define a variable for the text that we get back from the analysis:

// app.py
def perform_sentiment_analysis(text):
  model_name = "distilbert-base-uncased-finetuned-sst-2-english"
  sentiment_analysis = pipeline("sentiment-analysis", model=model_name)
  results = sentiment_analysis(text)

From there, we’ll assign variables for the score label and the score itself before returning it for use:

// app.py
def perform_sentiment_analysis(text):
  model_name = "distilbert-base-uncased-finetuned-sst-2-english"
  sentiment_analysis = pipeline("sentiment-analysis", model=model_name)
  results = sentiment_analysis(text)
  sentiment_label = results[0]['label']
  sentiment_score = results[0]['score']
  return sentiment_label, sentiment_score

That’s our complete perform_sentiment_analysis() function!

Transcribing Audio Files

Next, we’re going to transcribe the content in the audio file into plain text. We’ll do that by defining a transcribe_audio() function that uses the speech_recognition library to transcribe the uploaded audio file:

// app.py
def transcribe_audio(audio_file):
  r = sr.Recognizer()
  with sr.AudioFile(audio_file) as source:
    audio = r.record(source)
    transcribed_text = r.recognize_google(audio)
  return transcribed_text

We initialize a recognizer object (r) from the speech_recognition library and open the uploaded audio file using the AudioFile function. We then record the audio using r.record(source). Finally, we use the Google Speech Recognition API through r.recognize_google(audio) to transcribe the audio and obtain the transcribed text.

In a main() function, we first check if an audio file is uploaded and the upload button is clicked. If both conditions are met, we proceed with audio transcription and sentiment analysis.

// app.py
def main():
  if audio_file and upload_button:
    try:
      transcribed_text = transcribe_audio(audio_file)
      sentiment_label, sentiment_score = perform_sentiment_analysis(transcribed_text)

Integrating Data With The UI

We have everything we need to display a sentiment analysis for an audio file in our app’s interface. We have the file uploader, a language model to train the app, a function for transcribing the audio into text, and a way to return a score. All we need to do now is hook it up to the app!

What I’m going to do is set up two headers and a text area from Streamlit, as well as variables for icons that represent the sentiment score results:

// app.py
st.header("Transcribed Text")
st.text_area("Transcribed Text", transcribed_text, height=200)
st.header("Sentiment Analysis")
negative_icon = "👎"
neutral_icon = "😐"
positive_icon = "👍"

Let’s use conditional statements to display the sentiment score based on which label corresponds to the returned result. If a sentiment label is empty, we use st.empty() to leave the section blank.

// app.py
if sentiment_label == "NEGATIVE":
  st.write(f"{negative_icon} Negative (Score: {sentiment_score})", unsafe_allow_html=True)
else:
  st.empty()

if sentiment_label == "NEUTRAL":
  st.write(f"{neutral_icon} Neutral (Score: {sentiment_score})", unsafe_allow_html=True)
else:
  st.empty()

if sentiment_label == "POSITIVE":
  st.write(f"{positive_icon} Positive (Score: {sentiment_score})", unsafe_allow_html=True)
else:
  st.empty()

Streamlit has a handy st.info() element for displaying informational messages and statuses. Let’s tap into that to display an explanation of the sentiment score results:

// app.py
st.info(
  "The sentiment score measures how strongly positive, negative, or neutral the feelings or opinions are."
  "A higher score indicates a positive sentiment, while a lower score indicates a negative sentiment."
)

We should account for error handling, right? If any exceptions occur during the audio transcription and sentiment analysis processes, they are caught in an except block. We display an error message using Streamlit’s st.error() function to inform users about the issue, and we also print the exception traceback using traceback.print_exc():

// app.py
except Exception as ex:
  st.error("Error occurred during audio transcription and sentiment analysis.")
  st.error(str(ex))
  traceback.print_exc()

This code block ensures that the app’s main() function is executed when the script is run as the main program:

// app.py
if __name__ == "__main__": main()

It’s common practice to wrap the execution of the main logic within this condition to prevent it from being executed when the script is imported as a module.

Deployments And Hosting

Now that we have successfully built our audio sentiment analysis tool, it’s time to deploy it and publish it live. For convenience, I am using the Streamlit Community Cloud for deployments since I’m already using Streamlit as a UI framework. That said, I do think it is a fantastic platform because it’s free and allows you to share your apps pretty easily.

But before we proceed, there are a few prerequisites:

  • GitHub account
    If you don’t already have one, create a GitHub account. GitHub will serve as our code repository that connects to the Streamlit Community Cloud. This is where Streamlit gets the app files to serve.
  • Streamlit Community Cloud account
    Sign up for a Streamlit Cloud so you can deploy to the cloud.

Once you have your accounts set up, it’s time to dive into the deployment process:

  1. Create a GitHub repository.
    Create a new repository on GitHub. This repository will serve as a central hub for managing and collaborating on the codebase.
  2. Create the Streamlit application.
    Log into Streamlit Community Cloud and create a new application project, providing details like the name and pointing the app to the GitHub repository with the app files.
  3. Configure deployment settings.
    Customize the deployment environment by specifying a Python version and defining environment variables.

That’s it! From here, Streamlit will automatically build and deploy our application when new changes are pushed to the main branch of the GitHub repository. You can see a working example of the audio analyzer I created: Live Demo.

Conclusion

There you have it! You have successfully built and deployed an app that recognizes speech in audio files, transcribes that speech into text, analyzes the text, and assigns a score that indicates whether the overall sentiment of the speech is positive or negative.

We used a tech stack that only consists of a language model (Transformers) and a UI framework (Streamlit) that has integrated deployment and hosting capabilities. That’s really all we needed to pull everything together!

So, what’s next? Imagine capturing sentiments in real time. That could open up new avenues for instant insights and dynamic applications. It’s an exciting opportunity to push the boundaries and take this audio sentiment analysis experiment to the next level.

Further Reading on Smashing Magazine

Visual Editing Comes To The Headless CMS

A couple of years ago, my friend Maria asked me to build a website for her architecture firm. For projects like this, I would normally use a headless content management system (CMS) and build a custom front end, but this time I advised her to use a site builder like Squarespace or Wix.

Why a site builder? Because Maria is a highly visual and creative person and I knew she would want everything to look just right. She needed the visual feedback loop of a site builder and Squarespace and Wix are two of the most substantial offerings in the visual editing space.

In my experience, content creators like Maria are much more productive when they can see their edits reflected on their site in a live preview. The problem is that visual editing has traditionally been supported only by site-builders, and they are often of the “low” or “no” code varieties. Visual editing just hasn’t been the sort of thing you see on a more modern stack, like a headless CMS.

Fortunately, this visual editing experience is starting to make its way to headless CMSs! And that’s what I want to do in this brief article: introduce you to headless CMSs that currently offer visual editing features.

But first…

What Is Visual Editing, Again?

Visual editing has been around since the early days of the web. Anyone who has used Dreamweaver in the past probably experienced an early version of visual editing.

Visual editing is when you can see a live preview of your site while you’re editing content. It gives the content creator an instantaneous visual feedback loop and shows their changes in the context of their site.

There are two defining features of visual editing:

  • A live preview so content creators can see their changes reflected in the context of their site.
  • Clickable page elements in the preview so content creators can easily navigate to the right form fields.

Visual editing has been standard among no-code and low-code site-builders like Squarespace, Wix, and Webflow. But those tools are not typically used by developers who want control over their tech stack. Fortunately, now we’re seeing visual editing come to headless CMSs.

Visual Editing In A Headless CMS

A headless CMS treats your content more like a database that's decoupled from the rendering of your site.

Until recently, headless CMSs came with a big tradeoff: content creators are disconnected from the front end, making it difficult to preview their site. They can't see updates as they make them.

A typical headless CMS interface just provides form fields for editing content. This lacks the context of what content looks like on the page. This UX can feel archaic to people who are familiar with real-time editing experiences in tools like Google Docs, Wix, Webflow, or Notion.

Fortunately, a new wave of headless CMSs is offering visual editing in a way that makes sense to developers. This is great news for anyone who wants to empower their team with an editing experience similar to Wix or Squarespace but on top of their own open-source stack.

Let’s compare the CMS editing experience with and without visual editing on the homepage of Roev.com.

You can see that the instant feedback from the live preview combined with the ability to click elements on the page makes the visual editing experience much more intuitive. The improvements are even more dramatic when content is nested deep inside sections on the page, making it hard to locate without clicking on the page elements.

Headless CMSs That Support Visual Editing

Many popular headless CMS offerings currently support visual editing. Let’s look at a few of the more popular options.

Tina

TinaCMS was built from the ground up for visual editing but also offers a “basic editing” mode that’s similar to traditional CMSs. Tina has an open-source admin interface and headless content API that stays synced with files in your Git repository (such as Markdown and JSON).

Storyblok

Storyblok is a headless CMS that was an early pioneer in visual editing. Storyblok stores your content in its database and makes it available via REST and GraphQL APIs.

Sanity.io (via their iframe plugin)

Sanity is a traditional headless CMS with an open-source admin interface. It supports visual editing through the use of its Iframe Pane plugin. Sanity stores your content in its database and makes it available via API.

Builder.io

Builder.io is a closed-source, visual-editing-first headless CMS that stores content in Builder.io’s database and makes it available via API.

Stackbit

Stackbit is a closed-source editing interface that’s designed to be complementary to other headless CMSs. With Stackbit, you can use another headless CMS to store your content and visually edit that content with Stackbit.

Vercel

Although it’s not a CMS, Vercel’s Deploy Previews can show an edit button in the toolbar. This edit button overlays a UI that helps content creators quickly navigate to the correct location in the CMS.

Conclusion

Now that developers are adding visual editing to their sites, I’m seeing content creators like Maria become super productive on a developer-first stack. Teams that were slow to update content before switching to visual editing are now more active and efficient.

There are many great options to build visual editing experiences without compromising developer-control and extensibility. The promise of Dreamweaver is finally here!