爬虫豆瓣电影top250PPT

import requestsfrom bs4 import BeautifulSoup发送请求并获取HTML内容url = "https://movie...

import requestsfrom bs4 import BeautifulSoup发送请求并获取HTML内容url = "https://movie.douban.com/top250"headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"}response = requests.get(url, headers=headers)response.raise_for_status()使用BeautifulSoup解析HTML内容soup = BeautifulSoup(response.text, "html.parser")提取电影列表movies = soup.select(".grid_view li")遍历电影列表并输出信息for movie in movies:title = movie.select_one(".title").textrating = movie.select_one(".rating_num").textinfo = movie.select_one(".bd p").text.strip()print(f"{title}: {rating} ({info})")这个代码使用Python的requests库发送HTTP请求，并使用BeautifulSoup库解析HTML内容。它首先发送一个GET请求到豆瓣电影Top250的URL，并获取返回的HTML内容。然后，它使用BeautifulSoup选择器提取电影列表，遍历每个电影并输出标题、评分和信息。注意，这只是一个简单的示例代码，实际使用时可能需要根据页面结构和数据格式进行适当的修改。除了提取电影的基本信息，我们还可以进一步扩展爬虫的功能。例如，我们可以尝试提取电影的详细信息、图片、评论等。下面是一个更完整的示例代码，可以提取电影的详细信息和图片：导入所需的库import requestsfrom bs4 import BeautifulSoup发送请求并获取HTML内容url = "https://movie.douban.com/top250"headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"}response = requests.get(url, headers=headers)response.raise_for_status()使用BeautifulSoup解析HTML内容soup = BeautifulSoup(response.text, "html.parser")提取电影列表movies = soup.select(".grid_view li")遍历电影列表并输出信息for movie in movies:title = movie.select_one(".title").textrating = movie.select_one(".rating_num").textinfo = movie.select_one(".bd p").text.strip()print(f"{title}: {rating} ({info})")这个代码在提取电影的基本信息之外，还提取了电影的年份、导演、演员、剧情简介和图片链接。注意，由于豆瓣电影Top250的页面结构可能会发生变化，因此代码中的选择器可能需要根据实际情况进行调整。此外，由于网络请求和页面解析的时间可能较长，因此在实际使用时可能需要添加适当的异常处理和超时设置。