How Open Source Can Help You Scrape LinkedIn in a Postgres Database

“Data” is changing the face of our world. It might be part of a study helping to cure a disease, boost a company’s revenue, make a building more efficient, or drive ads that you keep seeing. To take advantage of data, the first step is to gather it and that’s where web scraping comes in.

This recipe teaches you how to easily build an automatic data scraping pipeline using open source technologies. In particular, you will be able to scrape user profiles on LinkedIn and move these profiles into a relational database such as PostgreSQL. You can then use this data to drive geo-specific marketing campaigns or raise awareness for a new product feature based on job titles.

Using Coronavirus Quarantines to Boost Your Skills

The outbreak of coronavirus globally has had a major impact on the global economy. Many countries have gone into lockdown, and there is no industry left that hasn't felt the impact. Even the programming world is facing some major setbacks because of it. However, if we talk about individual programmers, they might also be facing social or physical problems after being in quarantine or working from home for 2 or 3 weeks. But we can turn this time in our favor by learning some new skills online. All you need is a computer and a good internet connection.

Though the programming world may not be facing too many problems, the overall IT industry is. If we look at the stats of developer hiring in the USA, in February, the hiring of developers fell by nearly 70,000, which was up 52,000 in January. This decrease in developer hiring has had a huge impact on demands on developers and affects organizational strength as well.

WebScraping With Python, Beautiful Soup, and Urllib3

In this day and age, information is key. Through the internet, we have an unlimited amount of information and data at our disposal. The problem, however, is because of the abundance of information we as the users become overwhelmed. Fortunately, for those users, there are programmers with the ability to develop scripts that will do the sorting, organizing, and extracting of this data for them. Work that would take hours to complete can be accomplished with just over 50 lines of code and run in under a minute. Today, using Python, Beautiful Soup, and Urllib3, we will do a little WebScraping and even scratch the surface of data extraction to an excel document.

Research

The website that we will be working with is called books.toscrape.com. It's one of those websites that is literally made for practicing WebScraping. Before we begin, please understand that we won't be rotating our IP Addresses or User Agents. However, on other websites, this may be a good idea, since they will most likely block you if you're not "polite." (I'll talk more on the concept of being polite in later posts. For now, just know that it means to space out the amount of time between your individual scrapes.)