Text Input Non-Editable

Featured Imgs 26

If you ever need to make an input field or text area non-editable the readonly attribute comes in very handy. When the readonly attribute is present in the markup, it specifies that the user will not be able to modify the value. However the user will still be able to tab to it and copy the text. Note: The readonly … Read more

The post Text Input Non-Editable appeared first on Web Design Weekly.

URL as a Sass Variable

Featured Imgs 26

Do you ever find yourself trying to remember the URL path of you images directory? Well with Sass you can quickly add the path to the desired location and just use the variable throughout your code. Sass to the rescue. All you need to do is declare the variable and then use the interpolation syntax when you reference it. More … Read more

The post URL as a Sass Variable appeared first on Web Design Weekly.

How to change the Input Font Size based on Text Length

Featured Imgs 26

Recently I had a great conversation with an outstanding User Experience designer about a few interactions on a project we were working on. One of the points we talked about was, what should happen when text within input fields are excessively long. The desired result we both agreed on was to change the font size of the html input field … Read more

The post How to change the Input Font Size based on Text Length appeared first on Web Design Weekly.

Recent Post Shortcode

Fotolia Subscription Monthly 4685447 Xl Stock

WordPress shortcodes are a simple way to set up functions to create macro codes for use in post content. For instance, the following shortcode (in the post/page content) would add your recent posts into the page: It’s pretty simple and brings your WordPress blog alive with ease. Recent Post Short Code In WordPress 1 Add this code to your functions.php file. 2 … Read more

The post Recent Post Shortcode appeared first on Web Design Weekly.

Extracting Structured Outputs from LLMs in LangChain

Featured Imgs 23

Large language models (LLMS) are trained to predict the next token (set of characters) following an input sequence of tokens. This makes LLMs suitable for unstructured textual responses.

However, we often need to extract structured information from unstructured text. With the Python LangChain module, you can extract structured information in a Python Pydantic object.

In this article, you will see how to extract structured information from news articles. You will extract the article's tone, type, country, title, and conclusion. You will also see how to extract structured information from single and multiple text documents.

So, let's begin without ado.

Installing and Importing Required Libraries

As always, we will first install and import the required libraries.
The script below installs the LangChain and LangChain OpenAI libraries. We will extract structured data from the news articles using the OpenAI GPT-4 latest LLM.


!pip install -U langchain
!pip install -qU langchain-openai

Next, we will import the required libraries in a Python application.


import pandas as pd
import os
from typing import List, Optional
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_openai import OpenAI
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
Importing the Dataset

We will extract structured information from the articles in the News Article with Summary dataset.

The following script imports the data into a Pandas DataFrame.


dataset = pd.read_excel(r"D:\Datasets\dataset.xlsx")
dataset.head(10)

Output:

image1.png

Defining the Structured Output Format

To extract structured output, we need to define the attributes of the structured output. We will extract the article title, type, tone, country, and conclusion. Furthermore, we want to categorize the article types and tones into the following categories.


article_types = [
    "informative",
    "critical",
    "opinion",
    "explanatory",
    "analytical",
    "persuasive",
    "narrative",
    "investigative",
    "feature",
    "review",
    "profile",
    "how-to/guide",
]

article_tones = [
    "aggressive",
    "neutral",
    "passive",
    "formal",
    "informal",
    "humorous",
    "serious",
    "optimistic",
    "pessimistic",
    "sarcastic",
    "enthusiastic",
    "melancholic",
    "objective",
    "subjective",
    "cautious",
    "assertive",
    "conciliatory",
    "urgent"
]

Next, you must define a class inheriting from the Pytdantic BaseModel class. Inside the class, you define the attributes containing the structured information.

For example, in the following script, the title attribute contains a string type article title. The LLM will use the attribute description to extract information for this attribute from the article text.

We will extract the title, type, tone, country, and conclusion.


class ArticleInformation(BaseModel):
    """Information about a news paper article"""


    title:str = Field(description= "This is the title of the article in less than 100 characters")
    article_type: str = Field(description = f"The type of the artile. It can be one of the following : {article_types}")
    tone: str = Field(description = f"The tone of the artile. It can be one of the following: {article_tones}")
    country: str = Field(description= """The country which is at the center of discussion in the article.
                                         Return global if the article is about the whole world.""")

    conclusion: str = Field(description= "The conclusion of the article in less than 100 words.")


Extracting the Structured Output from Text

Next, you must define an LLM to extract structured information from the news article. In the following script, we will use the latest OpenAI GPT-4o LLM.


OPENAI_API_KEY = os.environ.get('OPENAI_API_KEY')
llm = ChatOpenAI(api_key = OPENAI_API_KEY ,
                 temperature = 0,
                model_name = "gpt-4o-2024-08-06")

You need to define the prompt that instructs the LLM that he should act as an expert extraction algorithm while extracting structured outputs.

Subsequently, using the LangChain Expression Language, we will create a chain that passes the prompt to an LLM. Notice that here, we call the with_structured_output() method on the LLM object and pass it the ArticleInformation class to the schema attribute of the method. This ensures the output object contains attributes from the ArticleInformation class.


extraction_prompt = """
You are an expert extraction algorithm.
Only extract relevant information from the text.
If you do not know the value of an attribute asked to extract,
return null for the attribute's value."
"""

prompt = ChatPromptTemplate.from_messages([
    ("system", extraction_prompt),
    ("user", "{input}")
])

extraction_chain = prompt | llm.with_structured_output(schema = ArticleInformation)

Finally, we can call the invoke() function of the chain you just created and pass it the article text.


first_article = dataset["content"].iloc[0]
article_information = extraction_chain.invoke({"input":first_article})
print(article_information)

Output:

image2.png

From the above output, you can see structured data extracted from the article.

Extracting a List of Formatted Items

In most cases, you will want to extract structured data from multiple text documents. To do so, you have two options: merge multiple documents into one document or iterate through multiple documents and extract structured data from each document.

Extracting List of Items From a Single Merged Document

You can merge multiple documents into a single document and then create a Pydantic class that contains a list of objects of the Pydantic class containing the structure data you want to extract. This approach is helpful if you have a small number of documents since merging multiple documents can result in the number of tokens greater than an LLM's context window.

To do so, we will create another Pydantic class with a list of objects from the initial Pydantic class containing structured data information.

For example, in the following script, we define the ArticleInfos class, which contains the articles list of the ArticleInformation class.


class ArticleInfos(BaseModel):
    """Extracted data about multiple articles."""

    # Creates a model so that we can extract multiple entities.
    articles: List[ArticleInformation]

Next, we will merge the first 10 documents from our dataset using the following script.


# Function to generate the formatted article
def format_articles(df, num_articles=10):
    formatted_articles = ""
    for i in range(min(num_articles, len(df))):
        article_info = f"================================================================\n"
        article_info += f"Article Number: {i+1}, {df.loc[i, 'author']}, {df.loc[i, 'date']}, {df.loc[i, 'year']}, {df.loc[i, 'month']}\n"
        article_info += "================================================================\n"
        article_info += f"{df.loc[i, 'content']}\n\n"
        formatted_articles += article_info
    return formatted_articles

# Get the formatted articles for the first 10
formatted_articles = format_articles(dataset, 10)

# Output the result
print(formatted_articles)

Output:

image3.png

The above output shows one extensive document containing text from the first ten articles.

We will create a chain where the LLM uses the ArticleInfos class in the llm.with_structured_output() method.

Finally, we call the invoke() method and pass our document containing multiple articles, as shown in the following script.

If you print the articles attribute from the LLM response, you will see that it contains a list of structured items corresponding to each article.


extraction_chain = prompt | llm.with_structured_output(schema = ArticleInfos)
article_information = extraction_chain.invoke({"input":formatted_articles})
print(article_information.articles)

Output:

image4.png

Using the script below, you can store the extracted information in a Pandas DataFrame.


# Converting the list of objects to a list of dictionaries
articles_data = [
    {
        "title": article.title,
        "article_type": article.article_type,
        "tone": article.tone,
        "country": article.country,
        "conclusion": article.conclusion
    }
    for article in article_information.articles
]

# Creating a DataFrame from the list of dictionaries
df = pd.DataFrame(articles_data)

df.head(10)

Output:

image5.png

The above output shows the extracted article title, type, tone, country, and conclusion in a Pandas DataFrame.

Extracting List of Items From Multiple Documents

The second option for extracting structured data from multiple documents is to simply iterate over each document and use the Pydantic structured class to extract structured information. I prefer this approach if I have a large number of documents.

The following script iterates through the first 10 documents in the dataset, extracts structured data from each document, and stores the extracted data in a list.


extraction_chain = prompt | llm.with_structured_output(schema = ArticleInformation)

articles_information_list = []
for index, row in dataset.tail(10).iterrows():
    content_text = row['content']
    article_information = extraction_chain.invoke({"input":content_text})
    articles_information_list.append(article_information)

articles_information_list

Output:

image6.png

Finally, we can convert the list of extracted data into a Pandas DataFrame using the following script.


# Converting the list of objects to a list of dictionaries
articles_data = [
    {
        "title": article.title,
        "article_type": article.article_type,
        "tone": article.tone,
        "country": article.country,
        "conclusion": article.conclusion
    }
    for article in articles_information_list
]

# Creating a DataFrame from the list of dictionaries
df = pd.DataFrame(articles_data)

# Displaying the DataFrame
df.head(10)

Output:

image7.png

Conclusion

Extracting structured data from an LLM can be crucial, particularly for data engineering, preprocessing, analysis, and visualization tasks. In this article, you saw how to extract structured data using LLMs in LangChain, both from a single document and multiple documents.

If you have any feedback, please leave it in the comments section.

A regular expression refresher

Category Image 080

#​701 — August 22, 2024

Read on the Web

JavaScript Weekly

Regexes Got Good: The History (and Future) of Regular Expressions in JavaScript — Regular expression support was always a little underwhelming in JS, but things have improved. Steven takes us on a tour to refresh our knowledge, as well as show off his ‘regex’ library that boosts JS regexes to a true A++ rating. Steven was co-author of O’Reilly’s Regular Expressions Cookbook and High Performance JavaScript so knows his stuff.

Steven Levithan

WorkOS: The Modern Identity Platform for B2B SaaS — WorkOS is a modern identity platform for B2B SaaS, offering flexible and easy-to-use APIs to integrate SSO, SCIM, and RBAC in minutes instead of months. It’s trusted by hundreds of high-growth startups such as Perplexity, Vercel, Drata, and Webflow.

WorkOS sponsor

Node v22.7.0 (Current) Released — Node 22.6 let you strip types from source code, but now with –experimental-transform-types you can transform TypeScript-only syntax into JavaScript before running it too. Module syntax detection is now also enabled by default.

Rafael Gonzaga

Bun v1.1.25: Now Running at 1.29 Million Requests per Second — I’m having a little fun with the title, but the latest version of the JavaScriptCore-based JS runtime has added node:cluster support and uses this to demo a high level of HTTP throughput on a ‘Hello World’ example. Support for V8’s C++ API has also landed – notable because Bun isn’t V8-based.

Ashcon Partovi

IN BRIEF:

We’ve mentioned ECMASCript 2024 a bit recently, but Pawel Grzybek has a neat and tidy overview of what’s new in the ES2024 spec.

🐝 Could Wasp be ‘the JavaScript answer to Django’ for full-stack webdev? The Wasp team certainly thinks so.

🎙️ Ryan Dahl, creator of both Node.js and Deno, went on the Stack Overflow podcast to talk about Deno’s current limitations and what’s coming in Deno 2.0.

RELEASES:

PlayCanvas Engine 2.0 – A powerful JS-based Web graphics platform.

Node v20.17.0 (LTS) – The LTS release of Node adds support for require-ing synchronous ESM graphs.

Astro 4.14 – The popular agnostic content site framework now includes an experimental API for managing site content.

pnpm 9.8, Vuetify 3.7, Neo.mjs 7.0

Join Us for ViteConf on October 3rd — Learn how the best teams are building the next generation of the web with Vite!

StackBlitz sponsor

📒 Articles & Tutorials

50 TypeScript F–k Ups Mistakes — An admittedly colorfully-titled book digging into lots of subtle mistakes you might run into with TypeScript. It’s available on Leanpub in PDF, iPad, and Kindle forms, or you can read it all directly on its GitHub repo. At least worth a skim in case you’re running into any of its points..

Azat Mardan

The Official Redux Essentials Tutorial, Redux — The long standing guide to how to use the popular Redux state container the right way with best practices has undergone a big reworking with TypeScript used throughout, new concepts added, and more coverage of RTK/React Toolkit features.

Redux Team

React is (Becoming) a Full-Stack Framework — Is React merely a frontend library? How does the backend fit in? The author shares his thoughts on what led him to start considering React as more of a full-stack solution.

Robin Wieruch

📄 Using JavaScript Generators to Visualize Algorithms Alexander G. Covic

📄 Optimizing SPA Load Times with Async Chunks Preloading Matteo Mazzarolo

📄 Using isolatedModules in Angular 18.2 Thompson and Lyding (Angular Team)

📄 How to Generate a PDF in a JavaScript App Colby Fayock

🛠 Code & Tools

Milkdown: Plugin-Driven WYSIWYG Markdown Editor Framework — A lightweight WYSIWYG Markdown editor based around a plugin system that enables a significant level of customization. It’s neat to see the docs are rendered by the editor itself. GitHub repo.

Mirone

Fuite 5.0: A Tool for Finding Memory Leaks in Web Apps — A CLI tool that you can point at a URL to analyze for memory leaks. Here’s how it works. There’s also a video tutorial.

Nolan Lawson

✂️ Cut Your QA Cycles Down to Minutes with Automated Testing — Are slow test cycles limiting your dev teams’ release velocity? QA Wolf provides high-volume, high-speed test coverage for web and mobile apps — reducing your test cycles to minutes. Learn more.

QA Wolf sponsor

LogTape: Simple Logging Library with Zero Dependencies — I’m digging this new style of library that promises support across all the main runtimes (Node, Deno, Bun) as well as edge functions and the browser devtools.

Hong Minhee

📊 Chart.js 4.4: Canvas-Based Charts for the Web — One of those libraries that feels like it’s been around forever but still looks fresh and gets good updates. Bar, line, area, bubble, pie, donut, scatter, and radar charts are all a piece of cake to render. Samples and GitHub repo.

Chart.js Contributors

Legend State: A Tiny, Fast and Modern React State System — A year ago, Jack Herrington wondered if Legend State could be ▶️ ‘the ultimate state manager’ and things have progressed a lot since, with it now boasting being the fastest React state library in town.

Jay Meistrich

Tagger: Zero Dependency, Vanilla JavaScript Tagging Library — You can play with a live demo here.

Jakub T. Jankiewicz

tinykeys 3.0: A Keybindings Library in ~650 Bytes — Keeps things as simple and sweet as possible.

Jamie Kyle

heic-to: Convert HEIC/HEIF Images to JPEG or PNG in the Browser

Hopper Gee

Cheerio 1.0 – HTML/XML manipulation library for Node.

🎨 Chroma.js 3.0 – JavaScript color manipulation library.

eta (η) 3.5 – Embedded JS template engine for Node, Deno, and browsers.

Embla Carousel 8.2 – Carousel library with fluid motion and good swipe precision.

d3-graphviz 5.6 – Graphviz DOT rendering and animated transitions.

Alpine AJAX 0.9 – Alpine.js plugin for building server-powered frontends.

Happy DOM 15.0 – JS implementation of a web browser sans UI.

Elliptic 6.5.7 – Elliptic curve cryptography in plain JS.

Poku 2.5 – Cross-platform JavaScript test runner.

💚 Use Node? Check out the latest issue of Node Weekly, our sibling email about all things relating to Node.js — from tutorials and screencasts to news and releases. We do include some Node related items here in JavaScript Weekly, but we save most of it for there.

→ Check out Node Weekly

JavaScript’s Rust tool belt

Category Image 080

#​702 — August 29, 2024

Read on the Web

JavaScript Weekly

Rspack 1.0: The Rust-Powered JavaScript Bundler — Far from being ‘yet another bundler’ with its own approach and terminology to learn, Rspack prides itself on being webpack API and ecosystem compatible, while offering many times the performance. The team now considers it production ready and encourages you to try your webpack-based projects on it.

Rspack Contributors

💡 Rspack also has a family of ancillary tools worth checking out, such as Rsdoctor, a tool for analyzing and visualizing your build process (for both Rspack and webpack!)

Front-End System Design — Learn to create scalable, efficient user interfaces in this extensive video course by Evgennii Ray. Explore the box model, browser rendering, DOM manipulation, state management, performance and much more.

Frontend Masters sponsor

How to Create an NPM Package in 2024 — Sounds simple, but there are a lot of steps involved if you want to follow best practices, introduce useful tools, and get things just right. Matt Pocock walks through the process here, and there’s a 14-minute screencast too, if you’d prefer to watch along.

Matt Pocock

IN BRIEF:

🤖 v0 is an AI-powered tool from Vercel for, originally, generating shadcn/ui-powered React components based upon prompts you supply. Now, however, it has basic Vue.js support too.

Deno 1.46 has been released and promises to be the final 1.x release before the much awaited Deno 2.0. Deno’s Node compatibility improves even more (it now supports Playwright and many more things) and ships with V8 12.9.

📊 IEEE has published its latest annual list of top programming languages. JavaScript takes third place, but TypeScript has leapt up several places to fourth.

RELEASES:

Prisma 5.19 – The popular ORM for Node.js and TypeScript adds ‘TypedSQL’, a way to write raw SQL queries in a type-safe way.

📈 billboard.js 3.13 – Popular D3 chart library adds area-step-range charts.

pnpm 9.9 – Fast, space efficient package manager.

React Email 3.0, Ember 5.11, Bun v1.1.26

📒 Articles & Tutorials

JS Dates are About to Be Fixed — Handling dates and times is famously a painful area for programmers and JavaScript hasn’t done a lot to make it easier. Libraries like Moment.js help a lot, but Iago looks at how the Temporal proposal and its features will begin to help a lot more over time.

Iago Lastra

Weekly Chats on the Art and Practice of Programming — Your home for weekly conversations with fascinating guests about how technology is made and where it’s headed.

The Stack Overflow Podcast sponsor

JavaScript Generators Explained — Jan was frustrated by the quality of documentation and articles explaining generators in JavaScript, and set out to explain things in a way that a more advanced developer could appreciate.

Jan Hesters

Implementing a React-a-Like from Scratch — While it’s unlikely you’ll actually want to do this, at least thinking about it can prove instructive as to what’s going on in React’s engine room.

Robby Pruzan

▶  How to Implement the 2048 Game in JavaScript — Ania is back with one of her usual easy to follow walkthroughs of implementing a complete game in JavaScript. This time it’s the 2048 sliding puzzle game. (Two weeks ago she did Tic-Tac-Toe as well.)

Ania Kubów

Learn Role-Based Access Control and Simplify Permissions Management — Enhance security and streamline access by managing user roles with Clerk Organizations.

Clerk sponsor

📄 The Only Widely Recognized JS Feature Ever Deprecated – Spoiler: It’s with. Trevor Lasn

📄 Generating Unique Random Numbers in JavaScript Using Sets Amejimaobari

📺 21 Talks from the Chain React 2024 Conference – A React Native event. YouTube

📄 Exposing Internal Methods on Vue Custom Elements Jaime Jones

📄 The Interface Segregation Principle in React Alex Kondov

🛠 Code & Tools

TypeScript 5.6 Release Candidate — As always, Daniel presents an epic roundup of what’s new. We’ll focus more on it next week though, as the final release is anticipated to land next Tuesday (September 3).

Daniel Rosenwasser (Microsoft)

Vuestic UI 1.10: A Vue.js 3.0 UI Framework — Features 60 customizable and responsive components and with the v1.10 release it’s gained a significant bundle size optimization, a custom compiler that improves build time performance, and other minor enhancements. GitHub repo.

Vuestic UI

✅ Bye Bye Bugs — Get 80% automated E2E test coverage for mobile and web apps in under 4 months with QA Wolf. With QA cycles complete in minutes (not days), bugs don’t stand a chance. Schedule a demo.

QA Wolf sponsor

Material UI v6: The Popular React UI Design/Component System — At ten years old, the popular design system has its latest major release. There’s a focus on improved theming, color scheme management, container queries, and React 19 support. There are revamped templates to be inspired by, too.

García, Bittu, Andai, et al.

npm-check-updates 17.0: Update package.json Dependencies to Latest Versions — That is, as opposed to the specified versions. It includes a handy -i interactive mode so you can look at potential upgrades and then opt in to them one by one.

Raine Revere

Code Hike 1.0: Turn Markdown into Rich Interactive Experiences — Aimed at use cases like code walkthroughs and interactive docs, Code Hike bridges the gap between Markdown and React when creating technical content that takes full advantage of the modern web.

Rodrigo Pombo

Calendar.js: A Calendar Control with Drag and Drop — A responsive calendar with no dependencies, full drag and drop support (even between calendars), and many ways to manage events with recurring events, exporting, holidays, and more.

William Troup

📊 Perspective 3.0 – Data visualization and analytics component. The core is written in C++ and compiled to WebAssembly where it can be used from JavaScript. Their homepage shows it off well with a live example.

json-viewer 3.5 – Display JSON data in a readable, user-friendly way.

♟️ Stockfish.js 16.1 – A JavaScript chess engine.

jest-dom 6.5 – Jest matchers to test DOM state.

Marked 14.1 – Fast Markdown compiler / parser.

Javet 3.1.5 – Java + V8. Embed JS into Java.

Pixi.js 8.3.4 – Fast 2D on WebGL engine.