What Is CI/CD? Beginner’s Guide To Continuous Integration and Deployments

CI/CD Explained

CI/CD stands for continuous integration and continuous deployment and they are the backbone of modern-day DevOps practices. CI/CD is the process that allows software to be continuously built, tested, automated, and delivered in a continuous cadence. In a rapidly developing world with increasing requirements, the development and integration process need to be at the same speed to ensure business delivery.

What Is Continuous Integration?

CI, or continuous integration, works on automated tests and builds. Changes made by developers are stored in a source branch of a shared repository. Any changes committed to this branch go through builds and testing before merging. This ensures consistent quality checks of the code that gets merged.

Data Privacy and Its Impact on Management

In the modern digital epoch, the importance of data management can hardly be overstated. Data is no longer just an operational byproduct but the lifeblood of organizations, fueling everything from strategic decisions to customer interactions. However, in this race for data-driven insights, data privacy often emerges as the jigsaw piece that doesn't quite fit. The recent uptick in consumer awareness, enabled by social media and news cycles, further adds to the urgency surrounding data privacy issues. High-profile data breaches have shifted the focus from merely collecting data to securing it effectively. The key question that emerges is: How does the evolving landscape of data privacy regulations intersect with the demands and objectives of modern data management? This blog aims to dissect this complex interplay, shining a light on the challenges and opportunities that lie at this intersection.

The Legal Landscape of Data Privacy

Understanding the legal framework surrounding data privacy is vital. Global regulations like the General Data Protection Regulation (GDPR) in Europe, the California Consumer Privacy Act (CCPA), and the Health Insurance Portability and Accountability Act (HIPAA) in the United States have set stringent standards for the handling and storage of data. These regulations are designed with core principles in mind, such as data minimization and the right to be forgotten.

Real-Time Data Architecture Patterns

The rapid proliferation and increased volume of data across industries has magnified the need for organizations to have a solid strategy in place for processing and managing real-time data. Improving overall data capabilities enables teams to operate more efficiently, and emerging technologies have even created a smoother pathway for bringing real-time data closer to business users, which plays a critical role in effective decision-making.

This Refcard focuses on architecture patterns for real-time data capabilities, beginning with an overview of data challenges, performance, and security and compliance. Readers will then dive into the key data architecture patterns — their components, advantages, and challenges — setting the stage for an example architecture, which demonstrates how to use open-source tools for a common real-time, high-volume data use case.

Beginner’s Guide on How to Moderate Comments in WordPress

Are you wondering how to moderate comments on your WordPress site?

WordPress’s built-in comment system allows your readers to engage with your content and interact with you directly. That said, some comments may contain elements that hurt your brand and your website.

In this beginner’s guide, we will show you how to moderate comments in WordPress using the default WordPress features and some plugins.

Beginner's Guide on How to Moderate comments in WordPress

What Is Comment Moderation in WordPress?

In WordPress, comment moderation is a feature that lets users control and filter the comments submitted on their websites.

With comment moderation, you can approve, edit, remove, or mark comments as spam before they appear publicly on your site.

While comments can build your website engagement, they can also pose a significant risk to your WordPress security.

Harmful comments usually come from spambots. These bots can fill the comment section with irrelevant or repetitive messages. As a result, it may be hard for real visitors to find and interact with each other.

Additionally, spam comments may have malicious links that redirect users to phishing websites or spread viruses by encouraging users to download dangerous files onto their devices.

Without comment moderation, your website can provide a poor user experience for your readers. It can also negatively impact your WordPress SEO.

If your site is filled with spammy comments, it can affect your site’s credibility and trustworthiness, leading to lower search engine rankings.

Whenever you build a new WordPress site, the default WordPress comment system will be active. Your blog post will have a comment form displayed at the bottom. Note that it may look different or not appear depending on the WordPress theme you are using.

WordPress comment form example

Generally, anyone with a valid name and email can leave a comment without verifying their identity. However, it doesn’t mean the comment will get approved automatically.

Instead, they will see a preview of it and a message that the comment awaits moderation. This means the website owner will decide whether to approve or delete the comment.

What a comment awaiting moderation looks like on a WordPress website

This basic setting is good enough to filter genuine comments from harmful ones. But there’s actually a lot more that you can do to keep your WordPress blog safe.

With that in mind, let’s take a look at how you can moderate comments on your WordPress website. You can use these quick links to jump between the different sections:

Basics of Moderating WordPress Comments

You can see all comments on your WordPress website by clicking on the ‘Comments’ menu in the WordPress dashboard.

Opening the Comments menu on the WordPress admin panel

To learn more about the Comments page, you can check out our glossary entry on WordPress comments.

In this guide, we will talk more about what you should do when you receive a comment and what factors to look for when moderating it.

First, let’s cover some basics. When moderating comments, you should look for the following signs:

  • A bunch of links, keywords, and strange characters – This is the most obvious type of spam comment. Be careful of clicking any links here, as they could contain inappropriate content in the comment or even viruses.
  • Suspicious or generic names – If you see a comment left by a user named ‘Best Mortgage Rates’ or ‘Cheap Printer Ink,’ then it may be a marketer trying to spam your site for a backlink.
  • Generic messages – Often, spammers rely on generic comments to bypass your spam filter. Examples include Thank You, Really Nice Article, or a generic statement with your post title. It might also be something like, “I agree, beginner’s guide to comment moderation is essential.”
  • Offensive language It’s important for WordPress bloggers to create a respectful space for their audience. Otherwise, you may risk making certain readers uncomfortable.

Now, let’s look at the different comment action links, which will appear when you hover your cursor over a comment. There is Approve, Reply, Quick Edit, Edit, History, Spam, and Trash.

The WordPress comment action links

To accept a comment, you can click on the ‘Approve’ button. This will make the comment publicly visible on your website.

If you want to let users know that their comment is live, read our guide on how to notify users when their comment is approved in WordPress.

To mark a comment as spam, you can click the ‘Spam’ button. It will move the comment to the Spam tab on the Comments page.

If a user complains that their comments are not appearing on your website, then this is the first place you should look. You can go to the ‘Spam’ tab and click the ‘Not Spam’ button below the comment.

Marking a comment as Not Spam on WordPress

You can also click on the ‘Empty Spam’ button to delete all spam comments at once. Even if you don’t, WordPress will automatically delete spam comments after 15 days.

If you find a comment in the All tab that is not necessarily spam but may be harmful to you and your readers, you can click the ‘Trash’ button. This will add the comment to the Trash tab.

Comments in the Trash will stay there for the next 30 days. After this time, WordPress will automatically delete them forever.

If you accidentally deleted a comment, then simply visit the ‘Trash’ tab and click on the ‘Restore’ link below the comment.

Restoring a WordPress comment from Trash

If you want to delete or mark multiple comments as spam, then you can use the ‘Bulk actions’ dropdown menu at the top of the comment list.

Note that doing this may cause your website to slow down while it processes all the comments.

Marking multiple comments as spam using the Bulk action option in WordPress

For more information, you can check out our guide on how to batch-delete spam comments in WordPress.

You can respond to a comment by clicking on the ‘Reply’ link. Once you’ve inserted your response, just click ‘Approve and Reply.’

Note that replying to a comment automatically approves it as well.

Approving and replying to a WordPress comment

The Quick Edit and Edit buttons work similarly. You can use either setting if you want to make the comment’s language clearer for visitors.

The difference is that, with Edit, you will be redirected to the Edit Comment page. With Quick Edit, you can modify the comment right on the Comments page like this:

Selecting the Quick Edit option on a WordPress comment

If you click the ‘History’ button, then you will see all the actions that have been done to the comment.

This feature can be helpful if you work with a team. It can help you track changes and understand how other people moderate comments on your site.

Reviewing the history of a WordPress comment

How to Configure the WordPress Comment Settings

We’ve covered the basics of moderating WordPress comments. We will now discuss the built-in WordPress comment settings, which will help you filter and control what kind of comments will appear on your website.

The comments settings page is located at Settings » Discussion. There are different sections on the discussion settings page, and we will walk you through each option on the page.

Changing the WordPress comment settings

Default post settings

The Default post settings offer three options to manage interactions and comments on your WordPress site:

The WordPress default post settings

The first option allows your blog to notify other blogs when you link to them in an article. The second option accepts notifications when they link to your articles.

These notifications are called pingbacks and trackbacks, and we recommend you uncheck both of these options. The first option can slow down your entire site, and the second option could bring you a lot of spam comments.

The third option on the article settings screen is ‘Allow people to post comments on new posts.’ It enables comments for all new articles you write on your WordPress blog.

Alternatively, you can turn comments on and off for individual articles, which we will show you later.

Other comment settings

WordPress' Other comment settings

In this section, you will notice the first option is ‘Comment author must fill out name and email.’ This option makes it mandatory for comment authors to provide a name and email address with their comments.

You need to check this option unless you want to allow anonymous commenting on your website.

There is also an option to require users to register on your site before leaving a comment. However, in our opinion, it’s not necessary for most sites as it may discourage new users from interacting with your post.

You will also see the option for closing comments on older articles. Some website owners use this to prevent spam, but it’s completely a personal preference.

Next is the ‘Show comments cookies opt-in checkbox, allowing comment author cookies to be set.’ Checking this box will let your website save the commenter’s name, email, and website details for when they want to comment on your post in the future.

Sometimes, WordPress comments can become a long thread that is difficult to keep track of. In this case, we recommend ticking the ‘Enable threaded (nested) comments’ option so that replies to specific comments appear directly beneath the original comment.

Having too many nested comments can negatively affect your page’s readability. The default setting of 5 levels is good enough for most WordPress websites.

If one of your articles becomes popular and gets too many comments, then the comment section will become too long. Users will have to scroll a lot to read the latest comments on the article.

To address this problem, you can check the option to break comments into pages. You can also use the dropdown menu to select whether to show the last or first comment page by default.

The last option is to display your most recent or oldest comments first. If you want to learn more about this, then you can read our guide on how to rearrange comments in WordPress.

‘Email me whenever’ and ‘Before a comment appears’

The WordPress comment 'Email me whenever' and 'Before a comment appears' settings

The next section allows you to receive email notifications whenever a user leaves a new comment on your site or a comment is held for moderation.

As you get more comments, these emails may become annoying, so we recommend turning the comment notifications off.

In the ‘Before a comment appears’ section, the first option is to approve each comment manually. Make sure this box is checked so that no comment can appear on your site without your approval.

Below this, you will see the ‘Comment author must have a previously approved comment’ option.

If this option is checked, then comments from authors with a previously approved comment will appear without explicit approval. Simply uncheck this option to make sure that all comments are manually approved.

Comment Moderation

WordPress Comment Moderation settings

As we’ve discussed before, a common trait among automated spam comments is that they contain a lot of links.

If you have already set your comments to be manually approved, then all your comments will go to the moderation queue regardless of how many links they have. If not, then you can specify to hold a comment in the queue if it contains a certain number of links.

You will also see a larger text area where you can enter words, IP addresses, email addresses, URLs, or browser information that you want to watch out for.

Any comment matching the things you enter here will be sent to the moderation queue.

Again, if you decide to have all comments manually approved, then you don’t need to do anything, as they are all going to the moderation queue anyway.

Disallowed Comment Keys

WordPress Disallowed Comment Keys settings

This setting used to be called the Comment Blocklist in WordPress 5.4. Here, you can set specific words that will automatically move the comment to Trash if used in a comment’s content, author name, URL, email, IP address, or browser information.

Make sure to use this feature carefully because real comments may get removed by mistake.

Avatars

WordPress Avatars settings

The last section on the Comments Settings screen is Avatars. These are the images that appear next to the comment author’s name on your website.

WordPress uses Gravatar, which is a free service that allows users to use the same avatar on all the blogs they visit. For more details, please see our guide on what Gravatar is.

We recommend checking the ‘Show Avatars’ box to make it easy to identify the different commenters on your post. You can also select the maximum rating of Gravatar that can be displayed on your blog.

WordPress uses Mystery Person as the default Gravatar when a comment author doesn’t have an image associated with their email address. You can change this by selecting a default avatar from the list or even adding your own custom default gravatar in WordPress.

That’s it! You have configured your comment settings. Don’t forget to click on the ‘Save Changes’ button to store your settings.

Clicking the 'Save Changes' button on the WordPress Discussion settings page

How to Moderate Comments Using Thrive Comments Plugin (Recommended)

The default WordPress comment system is good, but it can be pretty basic. For access to more comment management and engagement settings, you can install a WordPress comment plugin.

These plugins can not only improve comment moderation but also significantly boost your comment engagement. As a result, your visitors can enjoy a more engaging and safe commenting experience.

Thrive Comments is one of the best plugins to keep comments in check for a great user experience while encouraging user interaction.

For example, with the Comment Conversion feature, you can direct commenters to a custom thank-you page, social sharing buttons, or a related post so that they can discover more of your content.

The Thrive Comments WordPress plugin

To use Thrive Comments, you can purchase it as an individual plugin or get the complete Thrive Themes Suite. This comes with all Thrive products, including Thrive Ovation, which can turn your comments into testimonials for your web pages in one click.

Once you have completed your payment, you will get a plugin zip package to install on your WordPress site. For more information, read our step-by-step guide on how to install a WordPress plugin.

Once installed, you will now see a ‘Thrive Comments Moderation’ menu under the Comments tab on the admin panel. Here’s what the page looks like:

Opening the Thrive Comments Moderation page on WordPress

The interface looks similar to the built-in Comments section. However, there are several differences.

This interface now includes the ‘Unreplied’ and ‘Pending my reply’ tabs.

The first tab collects all comments that have not been replied to. Meanwhile, the second tab has all the comments that have been assigned to you by the website admin so that you can respond to them.

The Thrive Comments Moderation tabs and search menu

Additionally, you can filter comments by page by entering the page name in the ‘View comments on’ field.

If you want to look for specific comments, then you can type in a term from the comment into the ‘Search in comments’ field.

You also get more comment action links. Besides approving, removing, editing, and marking comments as spam, you can click the ‘Delegate’ button to assign a comment to another user.

This feature is handy if you run a WordPress blog with multiple writers.

Clicking the 'Delegate' button on a comment using the Thrive Comments plugin

If you click the ‘More’ button, then you will see the ‘Feature’ option. Selecting it will pin a comment to the top of the comment list on a blog post.

This way, important or noteworthy comments stay visible and easily accessible to all readers. All pinned comments can be found in the ‘Featured’ tab.

Clicking the 'Feature' option in the Thrive Comments plugin

For more details, you can see our guide on how to feature or bury comments in WordPress.

If you want to look at your entire comment activity, then just click the ‘Reports’ button at the top of the page.

You will be redirected to the Comments graph, which is a handy tool for evaluating your user engagement.

Clicking the Reports button on the Thrive Comments Moderation page

Here, you can see a timeline overview of all the comments you’ve received, approved, replied to, featured, marked as spam, and removed.

This is what the graph looks like on our testing site:

Thrive Comments graph report

You can also filter the comment activity using the options at the top. With ‘Show report,’ you can check out different types of reports. Or fill out the blog post title in the ‘View comments on’ field to see a comment graph from a specific post.

With the ‘Date interval’ option, you can change the time period of the graph. On the other hand, the ‘Graph interval’ setting lets you see the graph from a Daily, Weekly, or Monthly perspective.

The different filtering options in the Thrive Comments Reports page

How to Allow Specific Users to Moderate Comments in WordPress

Let’s say you work with a team to run your WordPress website, and you get a lot of comments every day. In this situation, you may want to grant comment moderation access to certain user roles only.

Doing this will let you assign comment moderation responsibilities to relevant team members best suited for the task, like a community manager.

This method not only helps you manage comments better but also keeps your WordPress site secure by allowing only the right users to access comments.

You can allow specific users to moderate WordPress comments in two ways: with the Thrive Comments plugin and the Comment Moderation Role plugin. Let’s take a look at each method.

Thrive Comments

To access the Thrive Comments’ moderation settings, go to Thrive Dashboard » Thrive Comments on your WordPress dashboard. Then, simply navigate to the ‘Comment Moderation’ tab.

Selecting user roles to moderate comments using the Thrive Comments Comment Moderation settings

At the top, you can check off which user roles can moderate comments.

Feel free to also turn on/off the ‘Exclude comments from moderators in the moderation dashboard’ setting as well.

Enabling it will make comments from moderators invisible on the Thrive Comments dashboard. This can help maintain a clear overview of user comments.

The rest of the settings in this tab are the same as the ones you will find on the Settings » Discussion page. If you make changes to these settings in this menu, then they will also be reflected in the default WordPress comment settings.

Comment Moderation Role

WordPress doesn’t offer a default user role that’s dedicated to moderating comments. For this, you can use the Comment Moderation Role plugin.

The plugin is created by our team at WPBeginner, and it allows you to give certain users the role of ‘WPB Comment Moderator.’ Then the assigned user will only see the comment moderation screen in WordPress.

You can assign the WPB Comment Moderator role to existing and new users. For more details, please see our guide on how to allow blog users to moderate comments in WordPress.

If you use Thrive Comments, then you will also see the WPB Comment Moderator role in the Comment Moderation tab, like so:

The WPB Comment Moderator role in Thrive Comments

How to Disable Comments for Specific Posts in WordPress

If you want to close comments on certain posts, then WordPress lets you disable them.

On your WordPress dashboard, simply go to Posts » All Posts. Then, click the ‘Quick Edit’ button for any blog post.

Clicking the 'Quick Edit' button on a WordPress post

After that, just uncheck the ‘Allow Comments’ option.

Then, click ‘Update.’ The comment section will no longer be visible on the blog post.

Disabling comments using the WordPress Quick Edit function

It’s also possible to disable comments on multiple posts simultaneously. All you need to do is check the blog posts and select ‘Edit’ in the ‘Bulk action’ dropdown menu.

Then, go ahead and click ‘Apply.’

Bulk selecting WordPress posts to be edited

From here, you can change the Comments option to ‘Do not allow.’

After that, simply click the ‘Update’ button.

Disabling comments in bulk using the WordPress post Edit function

Finally, you can close the comment section while editing a blog post in the WordPress Block Editor. Simply go to the ‘Discussion’ box from the ‘Post’ settings menu on the right panel.

If you have done that, you can uncheck the ‘Allow comments’ box.

Disabling the comment section on an individual WordPress post

If you want to remove the comment section for good, then just see our guide on how to completely disable comments in WordPress.

How to Filter Spam Comments With Akismet

To filter spam comments on your WordPress website, you can use Akismet. It’s a spam-filtering WordPress plugin developed by Automattic. This anti-spam plugin usually comes installed with your WordPress installation.

For more details, you can check out our guide on what Akismet is and why you should use it.

Once you mark a comment as spam, Akismet will learn to catch similar comments in the future.

If, for some reason, you have hundreds of spam comments in the ‘Pending’ tab, then simply click on the ‘Check for Spam’ button.

This will trigger a spam check on existing comments on your website, and Akismet will move the spam comments from Pending to Spam.

Clicking the 'Check for Spam' button on the WordPress Comments page

One way to combat spam comments further is by removing the URL field in the comment form. To do that, you can read our guide on how to remove the website URL field from the WordPress comment form.

We hope this article helped you learn how to moderate comments in WordPress. You may also want to check out our guide on how to make blog post comments searchable and our expert pick of the best WordPress plugins to grow your website.

If you liked this article, then please subscribe to our YouTube Channel for WordPress video tutorials. You can also find us on Twitter and Facebook.

The post Beginner’s Guide on How to Moderate Comments in WordPress first appeared on WPBeginner.

ChatGPT Integration With Python: Unleashing the Power of AI Conversation

In the ever-evolving landscape of artificial intelligence, language models have taken center stage, and GPT-3, the brainchild of OpenAI, has captivated developers and enthusiasts worldwide. ChatGPT, a specific implementation of the GPT-3 model, has gained popularity for its ability to generate human-like text and engage in meaningful conversations. Integrating ChatGPT with Python opens up a world of possibilities for creating interactive chatbots, automating customer support, enhancing user experiences, and much more.

In this blog, we will delve into the fascinating realm of ChatGPT integration with Python. We'll explore what ChatGPT is, the technology behind it, the benefits of using it, and provide practical examples of how to integrate ChatGPT with Python for a variety of applications.

A Five-Minute Intro to Crossplane

Welcome to this issue of The Activation Function. Every other week, I introduce you to a new and exciting open-source backend technology (that you’ve probably only kind of heard about… ) and explain it to you in five minutes or less so you can make better technical decisions moving forward.

In this issue, we’ll explore Crossplane, an open-source framework to provision and manage cloud resources across any cloud provider (aka a multi-cloud control plane) using the magic of Kubernetes.

Project Oxygen: Breathing New Life into Teams and Organizations

In today's fast-paced, ever-evolving business landscape, organizations are constantly on the lookout for ways to improve productivity, enhance team dynamics, and boost overall performance. Google, a company renowned for its innovative approach to workplace culture, embarked on a mission to identify the key factors that contribute to effective team management. The result of this endeavor is Project Oxygen, an in-depth research initiative that has transformed the way teams and organizations operate. In this article, we will delve into the origins of Project Oxygen, explore its core findings, and discuss how it can be applied to benefit teams and organizations.

Project Oxygen: A Breath of Fresh Air

Launched in 2008, Project Oxygen was born out of Google's desire to understand what makes a great manager. The company analyzed data from more than 10,000 observations, including performance reviews, feedback surveys, and nominations for top-manager awards. Through this extensive research, Google identified eight key behaviors that characterized its most effective managers. These behaviors, which have since been refined into ten, serve as the foundation for Project Oxygen and have been widely adopted by organizations around the world.

Implementing AI-Driven Edge Insights for Fleet Technology

In today's tech-driven world, fleet management has become a critical part of various industries. Whether it's tracking vehicles, optimizing routes, or monitoring vehicle health, developers are playing a pivotal role in building solutions for fleet technology. In this article, we'll walk you through the essential steps to create effective fleet technology solutions that can help streamline operations, increase efficiency, and enhance safety.

Before we dive in, let’s take a look at the background of connected fleet vehicles because it’s the reason we’re developing innovative AI fleet solutions.

Leveraging React in ServiceNow Applications

ServiceNow, the cloud-based platform that streamlines and automates IT service management, has become a staple in the modern enterprise. Its ability to enhance efficiency and productivity is unquestionable. However, what if you could take it a step further? What if you could supercharge your ServiceNow applications with the power of React? In this in-depth exploration, we will uncover the synergy between React and ServiceNow, helping you unlock new possibilities for your organization.

Understanding ServiceNow

Before we dive into the technical aspects of incorporating React into ServiceNow applications, it's essential to have a clear understanding of ServiceNow itself.

Language Modeling with LSTM using Wikipedia Text – Predicting Next Word

Language modeling is the cornerstone of advanced natural language processing, forming the backbone for cutting-edge technologies like ChatGPT. At its core, it involves predicting words based on context, a fundamental principle underlying modern large language Models (LLMs). There are various techniques for language modeling, with attention mechanisms emerging as the latest innovation. To comprehend attention, understanding Recurrent Neural Networks (RNNs) is crucial.

In this article, you will implement a language model in Keras using a Long Short-Term Memory (LSTM) network, a specialized type of recurrent neural network. We will focus on training our model with text data from Wikipedia. After training, the model will be able to predict the next word accurately based on input text.

So let's begin without ado.

Importing Wikipedia Data

We will use the content from Wikipedia's page on "Artificial Intelligence" to train our next word predictor model.

You can import the Wikipedia data using the Python Wikipedia library. You can download the library using the following script:

! pip install wikipedia

Let's search some Wikipedia pages using a keyword. You can use the wikipedia.search() function to do so.

import wikipedia
pages = wikipedia.search("Artificial Intelligence")
pages

The search() method returns the following pages based on the keyword' Artificial Intelligence'.

Output:

['Artificial intelligence',
 'Generative artificial intelligence',
 'Artificial general intelligence',
 'A.I. Artificial Intelligence',
 'Applications of artificial intelligence',
 'Hallucination (artificial intelligence)',
 'Ethics of artificial intelligence',
 'History of artificial intelligence',
 'Swarm intelligence',
 'Friendly artificial intelligence']

You can limit the number of results using the results parameter. The following script will return the single most relevant result.

pages = wikipedia.search("Artificial Intelligence", results = 1)
pages

Output:

['Artificial intelligence']

The output shows that the Wikipedia page "Artificial Intelligence" is the most relevant to our search query.

You can retrieve a page's contents using the wikipedia.page() method. The method accepts the page title. The following script prints the content of the Wikipedia article on "Artificial Intelligence" using the content attribute. The output below shows cropped content for the page.

ai_page = wikipedia.page(pages[0])
ai_page.content

Output:

Artificial intelligence (AI) is the intelligence of machines or software, as opposed to the intelligence of humans or animals. It is also the field of study in computer ...

We will use the content of the above page to train our LSTM model for the next word prediction.

Preprocessing Text for LSTM

LSTM accepts text in a specific format. We need to preprocess the content of the Wikipedia page to convert the contents of the page to the LSTM desired format.

The first step is to split the page's contents into multiple sentences. You can use any splitting strategy that suits you. I will split the page into multiple sentences using the new line \n character as a splitter.

The below script returns 224 sentences. The script also prints the first sentence. Note that we only keep sentences having at least two words. This is because we will split the sentences into inputs and outputs later and need at least one input word to predict the output word.

import re

# Split the text into sentences using '\n' as the separator
sentences = ai_page.content.split('\n')

# Function to remove special characters from a sentence
def remove_special_characters(sentence):
    return re.sub(r'[^\w\s]', '', sentence)

# Filter sentences with at least two words
sentences = [sentence.strip() for sentence in sentences if len(remove_special_characters(sentence).split()) >= 2]

# Printing length of total sentences
print(f"Total number of sentences: {len(sentences)}")

sentences[0]

Output:


Total number of sentences: 224

Artificial intelligence (AI) is the intelligence of machines or software, as opposed to the intelligence of humans or animals. It is also the field of study in computer science that develops and studies intelligent machines. "AI" may also refer to the machines themselves.

LSTM is a neural network that works with numbers. The input sentences in our dataset consist of texts. We need to convert the text inputs to numbers. You can use the Tokenizer class from the keras.preprocessing.text module to do so.

The fit_on_texts() method assigns an integer value to each unique word in your input sentences. The number of unique words will also be our vocabulary size.

You can see the integer value assigned to each word using the word_index attribute of the tokenizer class.


from keras.preprocessing.text import Tokenizer

# Create a tokenizer instance
tokenizer = Tokenizer()

# Fit the tokenizer on the text data
tokenizer.fit_on_texts(sentences)

# Print the word index (word to integer mapping)
tokenizer.word_index

Output:


{'the': 1,
 'and': 2,
 'of': 3,
 'to': 4,
 'a': 5,
 'in': 6,
 'that': 7,
 'ai': 8,
 'is': 9,
 'as': 10,
 'for': 11,
 'are': 12,
 'by': 13,
 'or': 14,
 'it': 15,
 'intelligence': 16,
 'be': 17,
 'learning': 18,
 'artificial': 19,
 'can': 20,
 'with': 21,
 'an': 22,
 'machine': 23,
 'on': 24,
 'not': 25,
 .......

The following script prints the vocabulary size for our model.

vocab_size = len(tokenizer.word_index)
print(vocab_size)

Output:

2472

Next, using the word_index dictionary mappings, we will convert all our input text sequences to integer sequences. The following script does this task and prints a partial integer sequence for the first sentence in our dataset.

Note: We did not show the whole sequence in the output due to space constraints.


# Convert text to integers using the tokenizer
int_sequences = tokenizer.texts_to_sequences(sentences)
int_sequences[0]

Output:

[19,
 16,
 8,
 9,
 1,
 16,
 3,
 63,
 14,
 175,
 10,
 604,
 .....

You can see that the first word in the first sentence is "Artificial", and it has been assigned the integer 19, which you can verify from the word_index dictionary. The second word is "intelligence," which is converted to 16, its integer counterpart from the word_index dictionary, and so on.

The next step is tricky. We will not use the complete sequence as inputs to train our model. Instead, we will split each input sequence into multiple sub-sequences. Let's take an example to illustrate this: consider an example sequence [19, 16, 8, 9]. We will create sub-sequences like [19, 16], [19, 16, 8], and [19, 16, 8, 9].

The idea here is to provide the model with varying contexts for prediction. For instance, the first sub-sequence [19, 16] informs the model that, given the first word, we want to predict the second word. Similarly, the second sub-sequence [19, 16, 8] instructs the model to predict the third word when provided with the first two words as context. This pattern continues for all the integers in the input sequence, allowing the model to grasp the relationships between words at different positions in the sequence.

By structuring the input data in this way, we enable the model to learn the sequential nature of language. It learns to predict the next word based on all the previous words, making our language model more accurate and contextually aware.

The following script divides the data into sub-sequences.


# Initialize an empty list to store the processed sequences.
processed_sequences = []

# Iterate through each input sequence in the list of integer sequences.
for inp_sequence in int_sequences:
    # Create a temporary list containing the first two items of the input sequence.
    temp_list = inp_sequence[:2]  

    # Append a copy of the temporary list to the processed sequences list.
    processed_sequences.append(temp_list.copy())

    # Iterate through the remaining items in the input sequence starting from the third item.
    for item in inp_sequence[2:]:
        # Add the current item to the temporary list.
        temp_list.append(item)

        # Append a copy of the updated temporary list to the processed sequences list.
        processed_sequences.append(temp_list.copy())

Let's print the first three sub-sequences. If you map the integer values in the following outputs to the words using the word_index dictionary from our tokenizer, you will see that these sub-sequences correspond to the words "Artificial intelligence", "Artificial intelligence ai", "Artificial intelligence ai is".


processed_sequences[0], processed_sequences[1], processed_sequences[2]

Output:


([19, 16], [19, 16, 8], [19, 16, 8, 9])

Finally, we will split our sub-sequences into features and labels. We want to predict the last word in the sequence based on all the previous words. The following script converts data into features and output labels.


# Extract features (X) and labels (Y) using list comprehensions
X = [sequence[:-1] for sequence in processed_sequences]  # Features (excluding the last item in each internal list)
y = [sequence[-1] for sequence in processed_sequences]    # Labels (only the last item in each internal list)

Let's print the first three original sequences and corresponding features and labels.


print(f"First 3 sequences: {processed_sequences[0], processed_sequences[1], processed_sequences[2]}")
print(f"Features list:  {X[0], X[1], X[2]}")
print(f"Labels list: {y[0], y[1], y[2]}")

Output:


First 3 sequences: ([19, 16], [19, 16, 8], [19, 16, 8, 9])
Features list:  ([19], [19, 16], [19, 16, 8])
Labels list: (16, 8, 9)

Our input features have varying lengths, but Keras models require input features to be of the same shape. To achieve uniformity, we first determine the length of the largest input sequence. Then, we pad all smaller sequences by adding 0s, making them the same size as the largest sequence.

The following script demonstrates how to find the length of the largest input sequence.


# Find the length of the longest sentence. We will use this for padding
max_length = max(len(internal_list) for internal_list in X)
max_length

Output:


552

And the following script uses the pad_sequences() method from the keras.preprocessing.sequence module to pad the smaller input sequences with 0 at the beginning.


from keras.preprocessing.sequence import pad_sequences

# Apply pre-padding to processed_sequences using pad_sequences function
X = pad_sequences(X, maxlen=max_length, padding='pre')

Finally, we need to convert output labels to one-hot encoded labels. The output will be a probability distribution over all the words in our vocabulary. Therefore, the output columns will equal the number of words in the vocabulary + 1.

We add one to the vocabulary size since the Keras tokenizer assigns word indexes from 1 to N where N is the vocabulary size. On the other hand, the to_categorical() method from the Keras.utils module expects integer values from 0 to N-1. Therefore to accommodate the word with the largest index value, we add 1 to the vocabulary size and pass it to the num_classes attribute of the to_categorical() method.


from keras.utils import to_categorical

y = to_categorical(y, num_classes = vocab_size + 1)

Finally, we can print the shapes of our input features and output labels to see what our training data looks like.


print(f"{X.shape, y.shape}")

Output:


(9136, 552), (9136, 2473))
Training an LSTM Model

We are now ready to define and train our LSTM model.

In the code below, we define our Keras model for next-word prediction. We define the input shape, specifying the sequence length as max_length, which is the length of the longest input sequence. Next, we define the input layer, which serves as the entry point for our model.

Subsequently, we add an embedding layer, a crucial component in natural language processing tasks. This layer transforms words into fixed-size vectors, capturing semantic relationships between them. In this example, we set the embedding size to 100, but you can adjust it according to your specific use case.

Next, we incorporate an LSTM (Long Short-Term Memory) layer, a type of recurrent neural network (RNN) capable of capturing long-term dependencies in sequential data. The LSTM layer comprises 500 units. You can modify the number of units you want.

Finally, our output layer employs the softmax activation function, creating a probability distribution over the words in our vocabulary. The model is compiled with the Adam optimizer, categorical cross-entropy loss function, and accuracy metric.

By calling get_model(), we instantiate our model and print its summary.


from keras.models import Model
from keras.layers import Embedding, LSTM, Dense, Input


def get_model():
  # Define the input shape (sequence length)
  input_shape = (max_length,)

  # Input layer
  input_layer = Input(shape=input_shape)

  # Embedding layer
  embedding_size = 100  # Example embedding size, adjust according to your use case
  embedding_layer = Embedding(input_dim = vocab_size +1,
                              output_dim = 100,
                              input_length =max_length)(input_layer)

  # LSTM layer
  lstm_1 = LSTM(500)(embedding_layer)

  # Output layer with softmax activation
  output_layer = Dense(vocab_size + 1,
                      activation='softmax',
                      name='output_layer')(lstm_1)

  # Create the model
  model = Model(inputs=input_layer, outputs=output_layer)

  # Compile the model with Adam optimizer, categorical cross-entropy loss, and accuracy metric
  model.compile(optimizer='adam',
                loss='categorical_crossentropy',
                metrics=['accuracy'])

  return model

model = get_model()
# Print the model summary
model.summary()

image_2.png

Finally, we train our model using the fit() method. We do not specify any validation data since we have a very small dataset, and our model will most likely overfit. The idea here is to see how to train a language model for next-word prediction. You can try validation data to see what results you get.


# Fit the model using X_train and y_train
model.fit(X, y,
          epochs = 30
          )

After 30 epochs, I achieved an accuracy of 98.97% on the training set.

image_1.png

Using Model for Text Generation

Let's generate the next words using the language model we trained.
We will define a method named generate_text() that accepts the input text and the number of words to generate as parameter values and return the generated text.


# Import the necessary libraries
import numpy as np

# Define a function 'generate_text' that takes input text and the number of words to generate as parameters
def generate_text(input_text, num_gen):

  # Initialize the current input with the given input text
  current_input = input_text

  # Iterate 'num_gen' times to generate the specified number of words
  for i in range(num_gen):
    # Convert the current input text into tokenized form using the tokenizer
    tokenized_text = tokenizer.texts_to_sequences([current_input])[0]

    # Pad the tokenized text to match the required input length for the model
    padded_text = pad_sequences([tokenized_text],
                                maxlen=max_length,
                                padding='pre')

    # Use the model to predict the next word in the sequence
    prediction = model.predict(padded_text, verbose=0)

    # Get the index of the predicted word with the highest probability
    predicted_index = np.argmax(prediction)

    # Find the corresponding word for the predicted index using the tokenizer's word index
    predicted_word = []
    for word, index in tokenizer.word_index.items():
      if index == predicted_index:
        predicted_word = word
        break;

    # Add the predicted word to the current input for the next iteration
    current_input = current_input + " " + predicted_word

  # Return the generated text
  return current_input


Using the generate_text() method we just defined, let's generate the next 20 words for the input text "natural".


input = "natural"
words_to_generate = 20
output = generate_text(input, words_to_generate)
output

Output:

natural language processing nlp allows programs to read write and communicate in human languages such as english and economics and others

In the output, you can see that the model generated the next 20 probably words. The output looks pretty coherent.

Conclusion

Language models form the foundation of many state-of-the-art text generators such as Chat-GPT, Bing, Bard, etc. In this tutorial, you saw how to develop a very naive language model that predicts the next most probable word. Using the LSTM model in Keras, you created a next-word predictor language model using text from a Wikipedia page.

However, this is just the tip of the iceberg. You should train your own next word predictor model on a larger dataset and see its performance on the validation data.

In one of my next articles, I will explain how to implement a large language model using attention layers in Keras. Till then, happy coding!

HTML Layout

The general appearance of a piece of writing, a picture, a piece of text, or another medium is created to appeal to the spectator and aid in understanding what they are looking at. For instance, Computer Hope has a distinctive layout that is identifiable to our visitors, making it easier for them to move around the website.

What Is an HTML Layout?

An HTML layout is a template for organizing web pages in a specific way. It is straightforward to use, understand, and adjust web design elements using HTML tags. A proper HTML layout is essential for any website and will significantly enhance its visual appeal. They will also be appropriately formatted on mobile devices because HTML layouts are often responsive by default.

How To Improve a GenAI’s Model Output

Generative AI, dating back to the 1950s, evolved from early rule-based systems to models using deep learning algorithms. In the last decade, advancements in hardware and software enabled real-time, high-quality content generation by large-scale generative AI models.

In this article, I’ll tell how you can successfully integrate Generative AI into large-scale production processes within the business environment. So, you will know how to prepare for implementing Generative AI at an enterprise level. For example, for customer service, marketing communications, finance management, or other GenAI business applications.

Implementing a Comprehensive ERP System Using SAFe®

The modern business landscape, resplendent in its technological evolution, underscores the indispensable role of Enterprise Resource Planning (ERP) systems. These systems, though monumental in their operational scope, offer the allure of a streamlined organization. However, the journey to a successful ERP implementation, given its sheer complexity, necessitates a structured approach. The Scaled Agile Framework (SAFe®), with its emphasis on iterative development and cross-functional collaboration, emerges as the lighthouse in this turbulent sea of ERP integration. Dive deep with me as we meticulously explore the intertwining realms of ERP and SAFe®.

The Expansive Universe of ERP Modules: A Brief Overview

For a journey to be successful, the traveler must first understand the vastness of the terrain. The same holds for ERP, where each module is a significant landmark:

Java Library Development

The 'java-library-template' is a comprehensive solution for Java library developers that simplifies every aspect of library creation and maintenance.

This blog post explores the template's array of features, including one-click project setup, automated releases, security scans, and effortless Javadoc generation. Discover how to keep dependencies up to date with Renovate and ensure seamless publication to Maven Central.