Laura Bouchard | The Blog Pros

July 21, 2021

How to Get and Set a Sitemap URL in Java

Sitemaps allow a webmaster to provide search engines with a list of URLs on a website that are available for crawling; this can be especially helpful if you’re attempting to increase your page rank. In addition to providing the list of URLs, sitemaps also supply information about each individual URL, such as relevancy to the website, update times, and change frequency.

Since the internet is a pretty crowded place, it is incredibly important to take the proper steps to ensure your site is getting the highest amount of traffic possible. If your website is just getting started and you are looking to increase traffic, creating a sitemap can assist search engines by providing insight into pages that are new, recently updated, or not immediately accessible through the browsable interface.

June 27, 2021

How to Convert DOCX to HTML in Java

As we have discussed in previous articles, while the Word DOCX format is the go-to for creating text documents, it can be insufficient when we enter the web-based territory. When considering formatting for online documents, it is Hyper Text Markup Language (HTML) that emerges as one of the clear winners for applications and websites. This dynamic language utilizes set cues or elements to construct documents that can be transmitted to browsers and presented to end-users as a readable web page. The structure of HTML even allows for the integration of images, interactive forms, and other objects that are more difficult to create with a straightforward Word document.

The Word DOCX format is actually based on a different markup language, XML (Extensible Markup Language). Microsoft transitioned its most popular programs – Word, Excel, and PowerPoint – to an open standard, XML-based format in the mid-2000s. This move was to create improvements in file size, image compression, and security, as well as to maintain an edge over their competitors. While some users still prefer the older DOC version due to its compatibility with other platforms, the DOCX format is generally the better choice for current word processing projects.

June 24, 2021

How to Convert JSON to XML or XML to JSON in Java

Understanding how to convert between complex file types will allow you to optimize and perfect your operations, ensuring your data is always in the appropriate form. The two formats featured in today’s tutorial are JSON and XML. Both formats are widely used in web-based applications, but one may be more ideal than the other for certain objects. JavaScript Object Notation (JSON) is defined as an open standard file and data-interchange format that uses human-readable text to store and exchange data. It was initially developed from JavaScript, but as a language-independent format it can be used in many different programming languages. Due to its versatility, efficient design, and low cost, JSON has become a go-to for many businesses.

Our other format, Extensible Markup Language (XML), is similar to JSON in that they both are easy to read, create, and decode. However, XML uses prescribed tags that divide information according to its traits and has limiting factors associated with characters (i.e. !”*(), etc.). While this option may contain a higher word count due to the tags, it does allow for more precision in how data is read by the computer, including improved metadata usability, which is unavailable in JSON.

June 11, 2021

How to Spellcheck Words and Sentences in Java

In our technology-driven world, electronic communication is increasingly overshadowing verbal communication. Whether we are filling out an online form, sending a text on our phone, or writing an email, it is a fact that many of our business and personal interactions require efficiently written (typed) language. Due to this heavy reliance on electronic communication, it is critical to ensure your online platform has a support system built-in to account for human error; if your application or website allows input or search queries from users, you run the risk of your systems not understanding the text due to spelling errors — and this is where spellcheck comes in.

When you stop and consider how many times you encounter spellcheck in your electronic interactions, it should become clear that it has created a huge failsafe for our often rushed and impatient natures. Spellcheck has come a long way since its beginnings; the first spellcheckers simply verified words instead of suggesting corrections. Fast forward to our current era and spellcheckers have improved in both functionality and efficiency; they operate in the background of our applications and let us know with a red line that we have made a potential error. This is often accomplished with Natural Language Processing (NLP) which, as we have discussed previously, enables computers to process and interpret human language in the form of text or audio data.

June 3, 2021

How to Validate Code Identifiers in Java

When building applications and programs for use by other developers, ensuring you are utilizing valid code identifiers is an important component. These unique identifiers are symbols that have been designated to specific program elements essential to the successful creation of a program, and they can refer to a wide variety of features, including class, namespace, types, variables, and more. To avoid any confusion or misinterpretation, the names given to the symbols should clearly indicate the usage of the corresponding element. The general idea is to create an identity of sorts for the symbol so that it can easily be identified by any individual, not just the original developer.

During the creation of these integral program pieces, it is also important to be conscious of the technical nuances of the applicable programming language; for example, since Java is case sensitive, identifiers that have different cases will be treated differently. As you may know, attempting to navigate the various rules and requirements for code identifiers can take up a lot of time, so we are going to highlight an easy tool you can integrate to do the work for you. The following API can be run in Java to validate input identifiers and analyze for incorrect strokes including usage of whitespace, hyphens, underscores, and other special symbols within the identifier. This is accomplished by configuring input rules for the symbols, such as allowing the identifier “helloWorld” but not "hello*World".

May 21, 2021

How to Protect URLs from SSRF Threats in Java

Server-side request forgery (SSRF) attacks are yet another form of cyber-crime, and they are designed to specifically target a server by sending back-end requests from vulnerable web applications. These attacks can threaten not only servers, but other connected confidential information such as cloud services in AWS, Azure, and OpenStack as well. They can be especially challenging to battle since they are generally used to target internal systems protected by firewalls that are inaccessible from the external network; by directing these strikes, the attacker has the potential to gain full or partial control of the requests sent by a web application.

There are multiple approaches that the malicious user may take in a typical SSRF attack; a frequently seen example is by inducing the server to create a connection back to itself or external third-party services. From here, the attacker can seize control of the third-party service URL to which the web application makes a request. Other examples include making requests to internal resources, running port scans on internal IPs, and more. These attacks exploit relationships that your server has built, inciting trust only to strike the vulnerable application and carry out their own agenda.

April 30, 2021

How to Generate and Compare Perceptual Image Hashes in Java

Perceptual image hashing is a relatively new process used primarily in the multimedia industry for content identification and authentication. The process itself uses an algorithm to extract specific features from an image and calculate a hash value based on that information. The hash value that is generated acts as a kind of ‘fingerprint’ for the image; it is a distinct identifier that is unique to its parent image.

As you may have guessed by the fingerprint comparison, perceptual image hashing is particularly useful for digital forensics, but it has become an important player in prohibiting online copyright infringement as well. By comparing the hash value of an original/authentic image with the hash value of a similar image, you can identify and match various images and calculate the Hamming Distance between them. For reference, Hamming Distance measures the minimum number of substitutions it takes to change one image to the other, so hash values that are closer together are more similar.

April 25, 2021

How to Check Text Inputs for SQL Injection Attacks in Java

SQL (Structured Query Language) injection is a code injection technique used to attack data-driven applications; the SQL statements are inserted into an entry field for execution and wreak havoc from there. This type of attack tends to seek and target existing security vulnerabilities within websites or other databases to acquire access to sensitive information. For example, if the field of an online form is coded incorrectly, this provides an opening for the malicious user to sneak in SQL commands that the system will consider valid and return a response containing information that can be leveraged to access the data and manipulate, modify, or destroy it from there.

Despite the growing number of organizations that have reported successful SQL injection attacks, this type of threat is often underestimated in comparison to other cyber-crimes. Due to their reliance on check-out forms for their websites, retail companies have shown to be particularly susceptible to these threats. While standard firewalls may aim at protecting your website or application from SQL injection, the potential for failure can cause serious damage and data loss for your company. The following APIs can assist in providing supplementary protection by detecting SQL injection attacks from single or multiple text inputs, and will even define the threat detection level you want to utilize.

April 11, 2021

How to Get Macro Information From Word and Excel in Java

According to the Microsoft website, a macro is a series of commands that you can use to automate a repeated task and can be run when you must perform the task. The use of macro instructions was originally initiated for assembly language programming to perform two main purposes: to reduce the amount of coding that had to be written by generating several assembly language statements from one macro instruction, and to enforce program writing standards.

Nowadays, macros are used for a wide variety of purposes and industries; Microsoft applications currently use Visual Basic for Applications (VBA) programming language to build their macros, and they allow you to choose whether you want to enable or disable existing macros from use. However, when Microsoft was first designing macro features for the Microsoft Office suite, they weren’t thinking about potential internet security risks.

April 9, 2021

How to Add a Watermark to a PDF Document in Java

Throughout history, watermarks have been used to verify the authenticity and integrity of documents, currency, stamps, and more. They were originally developed for the paper-making process in thirteenth-century Italy to identify the manufacturer of the paper, and the practice spread rapidly across the rest of Europe. Fast forward several centuries to our current state and watermarks have bridged the gap between paper and digital mediums. Digital watermarks are used to verify authenticity and integrity as well, but in a different capacity than their paper counterparts. Some of the primary tasks they’re used for include copyright protection, source tracking, fraud protection, and online content management.

Since the PDF format is frequently used to share sensitive information, watermarks can be added to increase the security of the document and ensure that it isn’t employed in an incorrect or inappropriate manner. These markers can be used to denote draft documents, specify security levels, or indicate brand name/ownership. In the following tutorial, we will demonstrate how you can use an API in Java to instantly add a text watermark to your PDF documents; customizable features are available including font name, size, color, and transparency.

March 28, 2021

How to Protect Against XSS Attacks in Java

Cross-Site Scripting (XSS) attacks are a form of threat that takes advantage of vulnerabilities in web applications to prey on user information. Using malicious scripts, attackers can reach different users through a usually trustworthy web page and access any information logged in the browser by the user including cookies and other sensitive information. These kinds of attacks can occur wherever a web program accepts user input without validation and subsequently uses it within its output.

It is important to take all necessary steps toward protecting your users, and this is especially true in the case of XSS attacks, as a user may only be aware of their use of your website, and not the malicious actor who is threatening them. This can then harm your website’s reputation as users will relate any issues to its users and may be disinclined to return.

March 26, 2021

How to Convert PDF to Text in Java

Without the ability to copy, paste, or edit within a PDF document, it can be a frustrating task to manually transcribe a PDF to text. Fortunately for us, we have Optical Character Recognition (OCR) technology to help us out. We have discussed this a bit in previous articles, but to clarify, optical character recognition or optical character reader is the electronic or mechanical conversion of images of typed, handwritten, or printed text into machine-encoded text.

OCR is most popular as a form of data entry for printed paper data records, but it is also frequently used to digitize printed texts so that they can be edited, stored compactly, or displayed online. This technology has been refined and trained to recognize patterns, and now with the additional assistance of AI, can provide a high degree of accuracy with little effort.

March 23, 2021

How to Extract Sentences and Entities From a String in Java

In this article, we will be discussing more great ways to utilize Natural Language Processing. As we have discussed in previous articles, natural language processing combines linguistics and artificial intelligence to perform large amounts of natural language data analysis. Essentially, this technology can simplify the scanning of content by categorizing and organizing it through machine learning. While these rules were formerly coded by hand, automatic learning has improved the process by leveraging statistical inference algorithms to produce models that can process unfamiliar or inaccurate information.

The two tasks that we will be covering today are how to extract sentences and how to extract entities from a string in Java. Extracting sentences from a string can be an incredibly time-consuming operation if you’re trying to parse chunks of text, but with the help of an NLP API, it becomes a quick and easy step. The API will scan the input string and return the separated sentences as individual strings, instantly making the text more readable for you or your customers.

March 21, 2021

How to Convert CSV to XML in Java

When it comes to transmitting and manipulating raw data, CSV and XML are two of the most popular file formats; CSV provides a simple way to organize data, and XML displays data using a markup language. However, if you need to perform complex data manipulation or data transfer between an array of applications, XML has wider support across programs than CSV. The CSV (Commas Separated Values) format, while great at organizing and storing data, presents a few problems when it comes to readability. Understanding the data contained in a CSV file is dependent on knowledge of the headers, and even then the potential for error is high.

The Extendable Markup Language (XML) format was developed with simplicity and usability across the internet in mind; this component alone makes it the preferred format for use in web services and applications. In addition to internet compatibility, XML provides support for various human languages via Unicode, making it ideal for international business as well. XML-based document formats are used for an array of needs including office productivity tools, communication protocols, and industry data standards.

March 10, 2021

How to Create a Web-Based Viewer in Java

A web-based viewer is essentially a basic visual display window for documents you want users to be able to explore on your website or application. Document viewers are a great way to get a quick look at a document without worrying if you have the correct applications to view it on your device. With these tools, customers can simply click on a PDF or Word document on a website and peruse it as they would in its native format.

Web-based or HTML viewers are also known as ‘zero footprint’ viewers as they do not require a separate client install or download, which allows for the viewing of documents within the browser itself. These installations are less of a hassle if you are dealing with just one format on one device, but considering the ever-changing array of file types and machines that we utilize every day, they have become inefficient and cumbersome. In addition to the absence of software installation, the ‘zero footprint’ name also signifies that the viewer does not store data on the user’s device, and there are no storage parameters that need to be met.

March 6, 2021

How to Detect Hate Speech Text in Java

The expansion of online content has caused an uptick in the number of users who engage and interact with each other on the web. While this is generally a positive development, it has also provided a platform for many varieties of hate speech. To clarify, hate speech itself is defined as abusive or threatening speech or writing that expresses prejudice against a particular group. Community forums, social media, and websites have seen a rise in the use of the space for hate crimes or verbal abuse, and while it is important to preserve the right to free speech, it is equally important to provide a safe environment for users.

Hate speech detection tools have proven useful in reducing the amount of hate speech that appears on your websites or applications; social media giants such as Facebook, Twitter, and YouTube have all taken steps to detect and when appropriate remove hate speech from their platforms. In this article, we will discuss how to use a Hate Speech Detection API in Java to scan input text and determine if hate speech language is detected. The API utilizes Natural Language Processing AI to analyze the contextual nuances of the text and extract relevant information that will allow you to identify the majority of hate speech. This tool is currently only available for English input text and will use 1-2 API calls per sentence.

February 22, 2021

Create a New ZIP Archive File in Java

Creating a ZIP file is one of the best ways to compress large amounts of data into an easily shareable format. The format was originally developed in 1989 for use by PKWARE Inc.’s PKZIP utility and was quickly picked and implemented by other software utilities, such as Microsoft and Apple. Since then, the term “zipping a file” has become synonymous with file compression.

There are many advantages to zipping files, with the most obvious reason being to optimize file storage. If you have an abundance of infrequently used files, they can be archived into a single, easily located ZIP file. Other advantages of zipping files include the ability to securely email a previously unmanageable amount of information to a client, as well as the easy transference to USB flash drives.

February 16, 2021

Decrypt a PDF Document With a Password in Java

Document encryption is one of the most common methods to ensure the secure passing of information between a business and its external customers. The encryption feature is offered on all PDF files and acts as an assurance that nobody who attempts to intercept the information will be able to open it without the password, which should follow in a separate communication.

Once you receive the file it should usually be stored with the encryption intact. However, if you have an encrypted drive or store your confidential files in an encrypted container, decrypting the file becomes an option. Decryption of a file will enable the print function, which is disabled on a secured PDF, and will also ensure that documents that were downloaded online can be opened without a password in the future.