15 Places to Find Free Datasets for Your Data Science Projects

If you’ve ever worked on a personal data science project, you’ve probably spent a lot of time scouring the internet for interesting datasets to analyze.

It can be fun to sift through dozens of datasets to find the best fit, but it can also be frustrating to download and import multiple CSV files, only to find that the data is just missing, not so interesting. Fortunately, there are online repositories that keep sets of data and (mostly) remove uninteresting ones.

Using MQTT for IIOT Apps

Introduction to MQTT

MQTT is a communication protocol that has taken off in the IIoT community. It’s a lightweight, efficient protocol that works through a publisher/broker/subscriber model. It creates an easy way for field devices to communicate and retrieve data from a single location. In this tutorial, we’ll go over MQTT and dive into an example of how you can publish data using a groov EPIC PAC and retrieve the data using an MQTT client.

How MQTT Works

The Open Systems Interconnection (OSI) model is used to describe the way machines and applications communicate between themselves. The model was developed with the idea that a provider can utilize different protocols or software components at every layer without having to re-architect the entire structure. The MQTT protocol will come into play at Layer 7 of the model; on top of TCP/IP (Layer 4). The reason that this is important is that this keeps a lot of infrastructure in place while redefining how the packets are being sent between devices.

Alluxio Use Cases Overview: Unify silos With Data Orchestration

This blog is the first in a series introducing Alluxio as the data platform to unify data silos across heterogeneous environments. The next blog will include insights from PrestoDB committer Beinan Wang to uncover the value for analytics use cases, specifically with PrestoDB as the compute engine.

The ability to quickly and easily access data and extract insights is increasingly important to any organization. With the explosion of data sources, the trends of cloud migration, and the fragmentation of technology stacks and vendors, there has been a huge demand for data infrastructure to achieve agility, cost-effectiveness, and desired performance. 

Demystifying the Repository Pattern in PHP

Introduction

In this article, we will talk about the Repository Pattern and how we implemented it in our Laravel application to solve a scalability problem. The Repository Pattern is one of the most discussed patterns due to a lot of conflict with ORMs. Many developers think that not all ORMs are suitable for this type of design.

We discuss this topic in details below explaining why and how we implemented it in our application.

What Does a Transparent and Secure Digital Workplace Look Like?

Over 25 percent of employees don’t trust their employers, and an even greater 50 percent think that their employers aren’t open or upfront with them. The lack of trust among employees is due to the lack of transparency in the workplace. 

In digitally transformed organizations with digitized workflows, decentralized teams, and remote employees, embracing and maintaining transparency across different workplace tools can become even more difficult. 

Data Providers, Java to JavaScript Integration, and The Future of Vaadin Flow

Vaadin Dev Day is an event that provides in-depth technical topics and insights to the Vaadin community. In the latest edition of the event, we learned about data providers for high-performance data access, advanced JavaScript and TypeScript integrations, and the future of Vaadin Flow. Here's a recap of the event.

High-Performance Data Access With Vaadin

In this talk, Simon Martinelli explains how to efficiently connect Vaadin Flow applications to databases using JPA and the DataProvider implementations that are available in the framework.

Why a Serverless Data API Might be Your Next Database

App development stacks have been improving so rapidly and effectively that today there are a number of easy, straightforward paths to push code to production, on the cloud platform of your choice. But what use are applications without the data that users interact with? Persistent data is such an indispensable piece of the IT puzzle that it’s perhaps the reason the other pieces even exist. 

Enter cloud and internet scale requirements, essentially mandating that back-end services must be independently scalable / modular subsystems to succeed. Traditionally, this requirement has been difficult in the extreme for stateful systems. No doubt, database as-a-service (DBaaS) has made provisioning, operations, and security easier. But as anyone who has tried to run databases on Kubernetes will tell you: auto scaling databases, especially ones that are easy for developers to use, remain out of reach for mere mortals.

Benefits of Data Ingestion

Introduction

In the last two decades, many businesses have had to change their models as business operations continue to complicate. The major challenge companies face today is that a large amount of data is generated from multiple data sources. So, data analytics have introduced filters to various data sources to detect this problem. They need analytics and business intelligence to access all their data sources to make better business decisions.

It is obvious that the company needs this data to make decisions based on predicted market trends, market forecasts, customer requirements, future needs, etc. But how do you get all your company data in one place to make a proper decision? Data ingestion consolidates your data and stores it in one place.

Make Analytical Data Available to Everyone by Overcoming the 5 Biggest Challenges [Webinar]

"Data and analytics for all!” — the admirable, new mantra for today’s companies. But it’s not easy to put all of an organization’s analytical data and assets into the hands of everyone that needs it. That’s why embarking on this democratization initiative requires you to be prepared to overcome the five monumental challenges you undoubtedly will face.

Join us for this interactive webcast where we will: explore the recommended components of an all-encompassing, extended analytics architecture; dive into the details of what stands between you and data democratization success, and; reveal how a new open data architecture maximizes data access with minimal data movement and no data copies.

Risky Business: Preparedness Lessons Learned from the Florida Water Plant Hack

You’d be hard-pressed to find someone in the IT security space who will argue against the importance of risk preparedness. Unfortunately, more often than not, people will talk-the-talk without walking the proverbial walk. It sounds smart: be ready for potential attacks before they happen. But we have a long way to go to put this sentiment into practice. Accidents are unplanned, and we're never quite as prepared as we should be. The "that will never happen to us" attitude is rampant among the enterprise, especially when it comes to cybersecurity.

Risk preparedness is something organizations need to start taking seriously, as seen by the recent Florida water plant hack, among others. If they don't, the outcomes could be devastating. Imagine a stadium of sick Super Bowl attendees or worse. While the focus has been largely on protecting big businesses or federal entities with lots of valuable data, no one is truly safe from bad actors — not even local municipalities. In fact, these could be even more dangerous targets when you consider something as serious as compromising a community’s water supply or information theft. 

The Role of Relays In Big Data Integration

The very nature of big data integration requires an organization to become more flexible in some ways; particularly when gathering input and metrics from such varied sources as mobile apps, browser heuristics, A / V input, software logs, and more. The number of different methodologies, protocols, and formats that your organization needs to ingest while complying with both internal and government-mandated standards can be staggering.

Is there a clean and discreet way to achieve fast data integration and still reap all of the benefits of big data analytics?

Building an ABAC Policy Using APIs and Java SDK From Machina Tools


Determining what data a user, application, or device can access can be one of the most important decisions an organization faces. You don’t have to be a healthcare or financial institution to be responsible for customer data. But to maintain customer trust, your organization may want to treat all customer data as such.

The data access problem is complex. Elements of policy may be driven by IT, human resources, legal, or even by finance. Policies might be enforced at different points depending on where data travels and how it is consumed. Policies might be enforced at the network layer through remote-access systems, at the database layer, within cloud infrastructure, or at endpoints like email and files. Most of these platforms inherently implement a permissive security policy.

The Simple Path to Protecting and Controlling Your Application Data

Whether you’re a software development team lead at a prestigious financial institution assigned to redact personally identifiable information (PII) before releasing the next bankruptcy report, or you're part of a development shop that has just been contracted by a large healthcare organization to help update their systems to meet HIPAA requirements, chances are you’ve been asked to obfuscate sensitive data.

Protecting sensitive data is not an uncommon requirement when building applications. In a recent survey, 71% of companies indicated they utilized encryption for some of their data in transit; 53% utilized encryption for some of their data in storage.

APIs = Access

To gather insights on the current and future state of API management, we asked IT professionals from 18 companies to share their thoughts. We asked them, "How does your company use APIs?" Here's what they told us:

Data Access

  • More data sources are being accessed through APIs. Communicating with APIs for reading and writing data back to those systems. At the core how we develop applications, manage DevOps and communicate with source systems.
  • APIs are quickly becoming the de facto way organizations deliver value in a digital world. Companies create discrete applications and data sets and expose them as a series of API-enabled services. The customer experience is then highly dependent on the strength of the underlying API architecture. We’ve seen this trend before. Web apps exploded in the early 2000s and then mobile apps earlier this decade. Now, APIs are following suit.

    The organizations that master this wave of API development, and deliver compelling experiences via those APIs, will drive a decade of competitive advantage. And as with all new emerging trends, the key to success is part people and process — hiring for API skills and adopting an API-first development practice — and part technology that provides a secure, fast, and scalable API infrastructure.
  • APIs provide access to data, software, and applications to enable the business to run and applications to provide a better user experience and customer experience. 
  • Our platform is a publisher of hundreds of business banking API endpoints as well as a heavy consumer of other industry and utility APIs ourselves. It’s safe to say that without APIs there would be no platform. We’re in the business of connecting businesses and corporate clients with banks, which is only possible over rich API data exchange. The problem is — every bank today has its own way of offering services to clients, so we had to create that layer to enables the delivery of services to many, in many ways.

    In our case — connecting any business application to any bank without the hustle of re-implementing the whole product for every bank and client combination. The current state for many is a file-based exchange with banks over a variety of protocols and formats, which in most cases means a custom project for each client to establish a connection between their ERP and bank services or data. 
  • We use APIs to request permissioned data and perform transactional capabilities. 
  • As the Data-as-a-Service company, we use APIs in two directions: first, for obtaining data from vendors such as Google and Amazon, and second — supplying it to our customers, who, for the most part, represent the Marketing Technology industry.

Internal and External Development

  • Primarily mobile development and quite a bit of web. There would be no apps or websites without APIs. Integration is a big issue.
  • We have a cloud-based integration platform that makes extensive use of APIs. Integration historically started with files, then moved to databases and service buses and now rely heavily on APIs. Many modern software applications provide a set of published REST APIs. These provide the perfect integration point as they provide both the transport (HTTPS) and data format (JSON), while the past with files, integration required two distinct sets of technology, a file transport, and a file format. Since many modern software applications publish their APIs, we have built connectors to these applications. The connectors encapsulate the API definition into an easy-to-use package on our integration platform.

    A user wishing to build an integration can select two connectors (for example Salesforce and Microsoft Dynamics Finance and Operations) and easily map customer information from Salesforce to Dynamics Finance. For users wishing to connect to APIs that we do not have a connector available, they can set up their own definitions and have our integration platform connect to any API. In addition to these API consumer use cases, we also have API provider capability in our integration platform. This allows a user to define a set of APIs on our platform (for example, to check order status) that they can then make public.

    This published API could then be connected to a backend system such as an ERP system. When a customer used the public-facing API published on our integration platform, the API could be connected to the backend ERP including translation and security to provide an update on their order.
  • We generally don't use the provided kits by the main providers but rather an HTTP library available in our programming language. For us, it's "requests" in Python. We then read the API documentation for each service and implement the calls needed to make it work. We have developed internal libraries to handle our main usage for our frequent library, which includes Stripe, Mailgun, and Intercom.
  • All the products and services we provide can be accessed via APIs. While we understand the value of web applications to convey meaning in a visual way or to engage with a broad audience, it is in our DNA to first and foremost engage with developers — and we do that by providing them with carefully built and documented APIs to interface their own products with ours. 
  • APIs power every piece of our product ecosystem. We have internal APIs that our web and desktop applications use, as well as public-facing APIs that our customers use to automate and customize their workflows. Beyond our own APIs, we integrate against a number of third-party APIs — from version control systems like GitHub to API gateways like Axway. 
  • We use APIs in two high-level categories: Internal and public APIs for our product offering and front-end applications are backed by a powerful set of APIs. We follow a three-layer API design pattern, including 1) Experience APIs used by our front end to give the experience to our customers. 2) Process APIs where our processing and business logic resides. 3) Systems APIs that use our datastores to access the data.  API for our internal IT projects to integrate the systems we use. For internal IT projects, we use the same design pattern as above. However, for these projects, we build the pattern using our own integration and our API manager product, both of which are part of our platform.

Microservices

  • In the last 10 years, APIs have become the universal language to integrate applications.  Monolithic applications kill innovation, it's too slow to integrate new things via the ESB. It's important to have: 1) distributed integration; 2) container-friendly solutions; and, 3) APIs are key to integration. Customers are moving away from choosing point solutions to solve one problem at a time looking for full-service pre-integrated solutions that work. 
  • Enterprises have built monolithic applications with a lot of APIs in one binary. When microservices come along, these are cut into small pieces. Instead of 50 APIs, you will have five. With serverless, it's an API call and then running a piece of code. One API code to call one function. Find the smallest unit of compute shared across monolithic, microservices, and serverless – that’s the APIs. 
  • There are two broad ways: we build (and share and consume) differentiating APIs related to our core business, and for everything non-core, we prefer API-first SaaS vendors. In our domain, we maintain an ecosystem of almost 500 microservices that comprise our mass customization platform. These microservices are used by our portfolio businesses — and third-parties fulfilling orders on our behalf – for everything from artwork preparation to shop-floor optimization to shipping rate calculations.

    We’re big believers in providing composable building blocks that our businesses can use to solve problems in novel ways, and we look for the same discipline when buying solutions outside of our domain.  Traditional vendors with monolithic solutions expect you to conform to their platform; API-first vendors allow you to integrate their functionality into yours.

Other

  • Support different types of file services. We use them to manage the content lifecycle, write from ingest to inter archiving, data governance, and so forth. Every platform has to expose APIs to build custom applications and integrations. Within the platform itself, there are many services that use APIs.

Here's who shared their insights:

Easy Access to Data With Spring REST Data

Spring Boot offers fantastic support for accessing data with JPA through the use of interfaces that extends to the repository. If we add to this the ease with which REST services are created, as explained in this entry, we can make an application that offers an API to access our preferred database very easily.

But if we want to implement HATEOAS in our project or if there are a lot of entry points in our API, we must write a lot of code. To solve this problem, Spring Boot offers the library Spring Data Rest. With it, we can write an API REST to access our database easily and without writing much code.

You Don’t Have to Be a Big Corporation to Have a Great Database Migration to the Cloud

data migration

IT is the heart of every business. Business records, plans, employee records and so much more is handled via IT. Even the smallest companies rely on computing to handle all the important aspects of their business.

No matter how convenient IT support is, it still has a stranglehold on the budget. Purchasing and repairing servers and hard drives are increasingly expensive, which is problematic for small businesses.