Day 3 of 101 Days of Python
A question posted on Reddit today involves iterating through html tags to find a specific set of information, and build a dictionary in Python.
I have a basic weather website request as below:
import requests, bs4, lxml
url = ('https://forecast.weather.gov/MapClick.php?..')
page = requests.get(url)
soup = bs4.BeautifulSoup(page.content, 'lxml')
A large section of the page with a class and id holds the information i'm looking for. I've pulled that out with the below:
weather = soup.find(id='detailed-forecast-body')
I would like to get a dictionary that is like {Today: "a slight change of ...", Tonight: "mostly clear, with a..."
I can list all the weather elements from above using the below:
sections = weather.find_all(class_='col-sm-2 forecast-label')
<div class="col-sm-2 forecast-label"><b>Today</b></div>
<div class="col-sm-2 forecast-label"><b>Tonight</b></div>
forecasts = weather.find_all(class_='col-sm-10 forecast-text')
I'm struggling to understand how I can iterate through the weather object, to pull out just the text I want.
Any help is greatly appreciated.
There is great news here: we have 99% of the solution worked out already. We just need to build a dictionary from what has already been built with beautiful soup:
First, we do need to change one thing. The url that was provided does not actually give us the information we need. So I grabbed an actual url from forecast.weather.gov so that we can actually get html.
So, if we take the code from the question, and use a real url, we get something like this.
import requests, bs4, lxml
# This is the broken url...
#url = ('https://forecast.weather.gov/MapClick.php?..')
url= 'https://forecast.weather.gov/MapClick.php?x=194&y=139&site=gid&zmx=&zmy=&map_x=194&map_y=139#.XrDOw_l7mV4'
page = requests.get(url)
soup = bs4.BeautifulSoup(page.content, 'lxml')
weather = soup.find(id='detailed-forecast-body')
sections = weather.find_all(class_='col-sm-2 forecast-label')
Lets take a look at sections, and forecasts. These two variables do contain the html day and forecast information that we are looking for.
>>> sections
[<div class="col-sm-2 forecast-label"><b>This Afternoon</b></div>, <div class="col-sm-2 forecast-label"><b>Tonight</b></div>, <div class="col-sm-2 forecast-label"><b>Wednesday</b></div>, <div class="col-sm-2 forecast-label"><b>Wednesday Night</b></div>, <div class="col-sm-2 forecast-label"><b>Thursday</b></div>, <div class="col-sm-2 forecast-label"><b>Thursday Night</b></div>, <div class="col-sm-2 forecast-label"><b>Friday</b></div>, <div class="col-sm-2 forecast-label"><b>Friday Night</b></div>, <div class="col-sm-2 forecast-label"><b>Saturday</b></div>, <div class="col-sm-2 forecast-label"><b>Saturday Night</b></div>, <div class="col-sm-2 forecast-label"><b>Sunday</b></div>, <div class="col-sm-2 forecast-label"><b>Sunday Night</b></div>, <div class="col-sm-2 forecast-label"><b>Monday</b></div>]
>>> forecasts
[<div class="col-sm-10 forecast-text">Sunny, with a high near 71. Breezy, with a northwest wind 15 to 20 mph, with gusts as high as 25 mph. </div>, <div class="col-sm-10 forecast-text">A chance of showers and thunderstorms before 11pm, then a slight chance of showers between 11pm and 2am. Mostly cloudy, then gradually becoming mostly clear, with a low around 42. Northwest wind 5 to 10 mph. Chance of precipitation is 30%. New precipitation amounts of less than a tenth of an inch, except higher amounts possible in thunderstorms. </div>, <div class="col-sm-10 forecast-text">Sunny, with a high near 64. North wind 10 to 15 mph, with gusts as high as 20 mph. </div>, <div class="col-sm-10 forecast-text">A 30 percent chance of showers after 1am. Increasing clouds, with a low around 43. North northeast wind 5 to 10 mph becoming southeast after midnight. New precipitation amounts of less than a tenth of an inch possible. </div>, <div class="col-sm-10 forecast-text">A chance of showers, with thunderstorms also possible after 1pm. Mostly cloudy, with a high near 62. South southeast wind 10 to 15 mph, with gusts as high as 20 mph. Chance of precipitation is 40%. New rainfall amounts of less than a tenth of an inch, except higher amounts possible in thunderstorms. </div>, <div class="col-sm-10 forecast-text">A 30 percent chance of showers and thunderstorms before 1am. Mostly cloudy, with a low around 37. New rainfall amounts of less than a tenth of an inch, except higher amounts possible in thunderstorms. </div>, <div class="col-sm-10 forecast-text">Sunny, with a high near 59.</div>, <div class="col-sm-10 forecast-text">Areas of frost after 5am. Otherwise, mostly clear, with a low around 36.</div>, <div class="col-sm-10 forecast-text">Areas of frost before 8am. Otherwise, mostly sunny, with a high near 69.</div>, <div class="col-sm-10 forecast-text">Mostly cloudy, with a low around 41.</div>, <div class="col-sm-10 forecast-text">Mostly sunny, with a high near 60.</div>, <div class="col-sm-10 forecast-text">Mostly clear, with a low around 36.</div>, <div class="col-sm-10 forecast-text">Isolated showers. Partly sunny, with a high near 60. Chance of precipitation is 20%.</div>]
So all we really need to do is the last 1% of the work, which is to build a dictionary where we get the keys from sections, and the values from forecasts.
Lets start by building a list of the strings we want from each. So from sections, we can build a list of strings that we want to use as dictionary keys (named time_periods), and from forecasts we can build a list of strings that we want to use as dictionary values (named time_period_forecasts).
time_periods = []
for section in sections:
time_periods += section.contents[0].contents
time_period_forecasts = []
for forecast in forecasts:
time_period_forecasts += forecast.contents
Finally, we need to put all of this together into the dictionary. Here we are just iterating through the index of each element in the time_periods and time_period_forecasts lists (this can be done many other ways). We could also improve this by adding a check that the lengths of the two lists are the same as well, but we will skip that for now.
# The Dictionary
d = {}
for i in range(len(time_periods)):
d[time_periods[i]] = time_period_forecasts[i]
Let's see if that worked. This is a nice opportunity to use Python's pretty print module pprint, since some of our forecasts are quite long.
>>> import pprint
>>> pprint.pprint(d)
{'Friday': 'Mostly sunny, with a high near 61.',
'Friday Night': 'Mostly clear, with a low around 37.',
'Monday': 'A 20 percent chance of showers. Partly sunny, with a high near '
'60.',
'Saturday': 'A 20 percent chance of showers after 1pm. Mostly sunny, with a '
'high near 64.',
'Saturday Night': 'A 20 percent chance of showers and thunderstorms. Partly '
'cloudy, with a low around 41.',
'Sunday': 'Partly sunny, with a high near 61.',
'Sunday Night': 'Mostly clear, with a low around 36.',
'Thursday': 'A 30 percent chance of showers. Cloudy, with a high near 59. '
'New precipitation amounts of less than a tenth of an inch '
'possible. ',
'Thursday Night': 'A chance of showers and thunderstorms before 1am, then a '
'slight chance of showers. Mostly cloudy, with a low '
'around 38. Chance of precipitation is 30%.',
'Tonight': 'Mostly clear, with a low around 43. Breezy, with a northwest wind '
'15 to 20 mph, with gusts as high as 25 mph. ',
'Tuesday': 'Sunny, with a high near 70. Breezy, with a north northwest wind '
'15 to 20 mph, with gusts as high as 25 mph. ',
'Tuesday Night': 'A chance of showers and thunderstorms before midnight, then '
'a slight chance of showers between midnight and 2am. '
'Partly cloudy, with a low around 42. Northwest wind 5 to 10 '
'mph. Chance of precipitation is 30%. New precipitation '
'amounts of less than a tenth of an inch, except higher '
'amounts possible in thunderstorms. ',
'Wednesday': 'Sunny, with a high near 64. North wind 10 to 15 mph, with gusts '
'as high as 25 mph. ',
'Wednesday Night': 'A 20 percent chance of showers after 1am. Partly cloudy, '
'with a low around 42. North northeast wind 5 to 10 mph '
'becoming east southeast after midnight. '}
Nice!