Why Newsdata.io is the best news API for you?

WILL YOU GET  RELEVANT DATA   (38).png Today I’m going to discuss with you how you can access breaking news, headlines, and search for articles from over 20,000 news sources and blogs with the Newsdata.io news API in Python.

There’s an incredible amount of things that you can do with Newsdata.io API such as showing users top and breaking news headlines researching potential clients for risk or opportunities natural language processing and machine learning the sky is the limit and what I want to do today is quickly get you started so that you can begin using the news API in Python today.

I’m also going to give you a heads up on some very important caveats that aren’t necessarily clear in the documentation. So make sure you watch till the end to avoid some very big pitfalls. Let’s go. The first stop is the news API website. It’s really pretty easy to navigate and there’s a lot of good stuff here but there’s a lot that’s not really relevant right away.

So I’m going to give you a quick tour of the site and introduce you to the sections that will be most relevant to you right now and then when you’ve played around with the API for a while and wants to do some more advanced searching you can come back and get some more information. If you look across the top you’ll see several tabbed links Getting Started.

This section includes example code and output of the three endpoints which are(1) top breaking news, (2) Historical news, and (3) Text-analysis and these are the three types of requests that you can make. You’ll notice that the output is in JSON format. Ultimately this will get transformed into a Python dictionary.

Some of the print functions in these code examples appear to have been written in Python 2. So watch copying and pasting some of this stuff. The examples also use the Request package. You can do this. However, the News API package is even simpler to use and less verbose.

This is the package I would recommend using especially since this package will allow you to call the endpoints as specific methods instead of having to program the URL header.

Documentation. This section provides a lot of information about how to use the API. However, the pages that are going to be most useful to you are (1) the client library page for Python and (2) the reference page for each endpoint. If you click on the Python page you can see an example of exactly how to get started. You first install the package with pip Then to use it, you import the News API client to set up the News API object that will handle the request.

You then have three methods available to you that correspond to the three endpoints we just looked at breaking news, Historical news, and Sources. You can see these parameters here. These are not all the optional parameters, this is just an example. To get more information about the parameters and what values you can actually use you’ll need to navigate to the endpoints pages. So let’s do that now.

The endpoint pages are going to give you a sample output but more importantly, they’re going to explain what all of the request parameters are and examples of values that you can use for these parameters.

I would recommend getting very familiar with these parameters especially because it will allow you to scope your query to the most relevant records. In addition to the parameters, you can find out what the response object is.

You’ll see this when I run some actual code, but the response object in Python is a dictionary that includes all of these components. The first level of the dictionary contains three objects: a string, an integer, and an array of articles which in python is a list of dictionaries containing the items that you see here.

News Sources. If you want to restrict your query to a specific news source, then you can. The string that you should pass to the source parameter is indicated hereunder each news source. You can also specify the country as well. Pricing.

This tab tells you what you get with different price plans. I’m using a developer account that is free. You can see that there are some differences here but let me just point out some of the big ones.

First of all, paying accounts to get the latest news in real-time, whereas the developer account has a 15-minute time delay. Not a big deal for me. The developer account can access articles up to a month old, but you’ll need to pay to access anything older than that.

So keep that in mind. With a developer account, you can submit up to500 requests per day. This number is substantially higher for paying accounts…it’s in the hundreds of thousands.

This may not seem like a big deal but there is a catch so hang on till the end of this video and I’ll demonstrate the undocumented search limit caps and also give some advice on how to optimize your query results in order to work around some of the developer account search limits.

Login. Finally is the login screen. After you register for an API key which I’ll show you in just a second you can log in and see a few interesting bits of information. First, you can see your account type and your API key. And apparently, you can regenerate an API key if it gets lost, stolen, or whatever. Second, there’s a chart that shows your usage history.

This is important if you’re making a lot of requests in a single day, especially with that 500 requests per day limit with a developer account. Finally, there’s a feed that shows Twitter updates for this API. Apparently, a package was recently developed for “R”. So that’s cool if you’re using R. Okay, now that we’ve done a tour of the website, let’s get started.

The first thing you’re gonna want to do is register for a free API key by clicking on the blue “Register to get API key” button on the website. A very brief form is going to open up and after you fill it out, Newsdata.io will email you your API key which you can then copy and paste into your code.

Next, you’re gonna need to install the package. So open up your command prompt type: pip install newspaper-python. And, after it’s installed, let’s start coding. Open up your IDE code editor, Jupiter notebook, whatever it is that you want to code in.

I’m going to import the package I need by typing “from newspaper import NewsApiClient” I’ve saved my API key in a module called “key” so I’m going to import that as well. I’m going to show you how to do something with one of the date fields so I’m going to import the “DateTime” module.

Finally, I’m going to do something in Pandas at the end of this video so let’s import Pandas as well. Next, let’s create a news API client object and I’m gonna use this to handle the query request.Type news api = newsdata.io/api/1/news?apikey=YOUR_API_KEY&..) As I said I’ve saved my API key in a variable.

You can enter yours here as a string if you wish. Next, let’s create a search request. There are three different methods that you can use to fetch data with a news API object. (1) get_top_headlines, (2) get_everything, and (3) get_sources. These are the endpoints that we saw on the News API website. I’m going to go with “get_everything”. However, the author of the Python package has translated the parameters into a Pythonic snake_case convention so just be cognizant of that naming difference or you’ll be banging your head against the wall trying to figure out what the problem is. Okay, let’s examine the objects. If I print the type of this data object you can see that it’s direct.

And since that is the case I can use the keys() method to see what keys are available to use. You can see that there are three objects inside and if you remember from the documentation we looked at on the website, the status returns a string, the total results returns an integer, and the object of the article is an array, which in Python is a list of dictionaries.

If I access the values of these first two keys you’ll see that I’m getting “ok” as to the status and then I’m getting 22 as total results. you’ll also notice that the field names are still in camelCase. If I print the type() of the articles object you can see that it’s a list. If I show the first item in the articles list you can see that it’s a dictionary with key-value pairs. So in order to make this a little bit easier, I’m going to save a copy of the articles to a new list I’m calling “articles”.

Now I’m going to iterate through each article in the list of articles and print an index number and then the article title. All of the other items in the dictionary can be accessed in the same way. Let’s look at the first article and print every key-value pair in that article. Everything here is pretty straightforward.

You can see the source, author, title, description, URL, URL to the image, publish date, content. Let me just talk a second about a couple of these things. The URL is a direct link to the article. So if I were to copy and paste that into the browser then it would take me directly to the article.

The URL to the image is also a direct link but you can add that image to your content. For example, if I add that URL to a markup tag you can see the image pop right up. You could tie this to your HTML tag as well if you’re doing web development. You can convert the publish date into a Python daytime object by importing the DateTime library, as I have, and convert it with this storytime() method.

This will not only let you print a more user-friendly date but then you can manipulate the date and get various parts of it to suit your needs. If you look at the content object, you’ll notice that it gives you quite a bit of text but not all of it. Instead, you get something that indicates an additional number of words.

I’m not certain because I have a developer account, but I believe that this is as much content as you get, which in this case is only about 275 characters. However, I can verify that the search results do include all text in the original articles.

Finally, I thought I’d show you one more cool thing that you can do with this object. Since these articles are stored in a list of dictionaries, this is easily imported into a pandas data frame by typing df = PD.DataFrame(articles) And when it’s in a data frame, the entire world of data manipulation in pandas and NumPy is available to you. Plus it just looks really nice. Okay, I know this is a long video but this last part is something that you really need to see if you want to avoid getting burned by the free developer account search limits. Here we go.

As of the making of this video, if I were to search for the phrase “iPhone X”, I would get approximately 5,000 plus results and that’s just in the last 30 days. Newsdata.io API will distribute these results into a number of pages, similar to if you were to do a Google search. You can adjust the number of results that appear on each page up to 100max.

When I tried this, I thought “okay, no problem, I’ll just iterate over the pages and extend the list results with the list.extend() method and that will give me the complete set of results.” However, I ran into a problem. The developer account is limited to a max of 100 results PERIOD.

So if you have 5,000 results, you’re only able to see the first 100. I couldn’t find this limit posted anywhere on the website. It may be there but I couldn’t find it. However, when I tried to pull in page two of the results, the error message was very clear if I want more than 100 results I need to upgrade to a paying account.

However, there are a few things that you can do to make your search more relevant and therefore reduce the size of your results. First, make sure that you’re more specific with keywords. Instead of searching for “iPhone X” perhaps search for “iPhone X reviews” or “iPhone X reviews space grey”. You can also limit the time scale.

You may only want to see the articles published in the last week. If so, you can adjust those settings. Perhaps you only care about articles published in a particular source or magazine, or a certain language. These are all things that you can do to increase the relevance of your search results and therefore reduce the total volume of results, making that developer caps less of a nuisance.

And that my friends are everything you need to get started using Newsdata.io news API in Python. Please check the video description for all relevant links to the API website and documentation.

How do you plan on using this Newsdata.io news API? Please leave your thoughts in the comments below.