The Open-Source Library to Improve Java Ability to Process Structured Data

The general trend of contemporary Java application frameworks is towards the separation of data storage from data processing, aiming to be more maintainable, scalable, and migratable. One typical example is the currently hot microservices. The new framework requires that business logic, rather than performed in the database as the conventional framework designs, should be implemented within the Java program. 

 On most occasions, the business logic in an application involves structured data processing. Databases (based on SQL) give a lot of support for the processing, enabling to implement business logic in a relatively simple way. Such support, however, has always been absent in Java, making it complicated and inefficient to implement the business logic with the language. As a result, development efficiency becomes sharply lower while the advantages of the framework are clear.  

Open-Source SPL That Can Execute SQL Without RDB

SQL syntax is close to natural language, with a low learning threshold and the bonus of first-mover advantage, it soon became popular between database manufacturers and users. After years of development, SQL has become the most widely used and most mature structured data computing language.

However, SQL must work based on RDB, and there is no RDB in many scenarios, such as encountering CSV \ restful JSON \ MongoDB and other data sources or performing mixed calculations between these data sources, such as CSV and XLS. In these scenarios, many people will choose to hard code algorithms in high-level languages such as Java or C#, etc., which requires writing lengthy underlying functions from scratch, and the execution efficiency is difficult to guarantee. 

Comparing SQL and SPL: Order-Based Computations

Reference of a neighboring record

This is the type of common and simple order-based calculations. When records are sorted in a specific order, a neighboring record reference references a neighboring record during a calculation. Below are some examples: 

1. Calculate the daily growth rate of a specific stock’s closing prices (link relative ratio). 

Looking for Suitable Tools for Calculating Large Files

What is a large file? A large file is a file that is too large to be read in at one time because of insufficient computer memory. In this case, direct use of desktop data tools (such as Excel) is powerless, often need to write a program to deal with it. Even if the program is written, a large file must be read in batches for calculation and processing. Finally, the batch processing results need to be properly summarized according to different calculation types, which is much more complicated than the processing of small file. There are many types of large files, such as text files, Excel files, XML files, JSON files, HTTP files. Among them, text (txt or CSV) is the most common.

The program languages that can be used to process large files are as follows:
1. Conventional high-level programming languages, such as Java, C/C++, C#, Basic, etc.
2. The file data is imported into the database and processed by SQL
3. Python
 4. esProc SPL

Looking for the Best Lightweight Data Analysis Script Tools

Almost all programming languages can manipulate data. Some are too general to lack functions for performing structured computations, such as C++ and JAVA, which produce lengthy code to deal with daily data analysis scenarios and are more suitable for taking care of major special projects. Some are technically-targeted and too highly-professional for daily analysis work, such as mathematical programming languages MATLAB and R, though they provide functions for structured data processing. My subjects in this article are the lightweight programming languages that are suitable for doing desktop analytic jobs. They are lightweight databases represented by MySQL, Excel VBA, Python pandas and esProc.

Now I’ll scrutinize the pros and cons of each to look at their capabilities.

How to Write Simple, Powerful Script Data Sources for BIRT Reports

1. Preface: JVM-Based SQL Functions and Stored Procedures

Some databases, such as MySQL, don’t have analytic functions. Some others, such as Vertica, don’t support stored procedures. They turn to external Python, R script, or other languages to deal with complicated data computations. But the scripting languages and Java, the mainstream programming language, are integration-unfriendly. Often, a lengthy Java script that tries to replace SQL functions or stored procedures aims at achieving a certain computing goal and is unreusable.

It’s not easy to implement complicated logics even with analytic functions. Here’s a common computing task: Find the first N customers whose sales account for half of the total sum and sort them by amount in descending order. Oracle implements it this way: