Project Solution Review
Get the solution of all components of the project "Web Crawler" with a detailed explanation.
Solution: Add URLs to queue
In this component, you were required to create a celery
queue and add tasks to the queue so that they can be fetched later by a worker
for processing.
Following is the solution:
Click on the Run button to execute the solution code provided below.
import celery import requests app = celery.Celery('celery-proj', broker='redis://localhost', backend='redis://localhost') @app.task() def getURL(url_to_crawl): dic = {} r = requests.get(url = url_to_crawl) text = r.text dic['data'] = text dic['status_code'] = r.status_code return dic if __name__ == '__main__': urls = ["http://educative.io", "http://example.org/", "http://example.com"] results = [] for url in urls: results.append( getURL.delay(url)) for result in results: print("Task state: %s" % result.state)
Solution: Create celery queue and add tasks (URLs to fetch)
Solution explanation
The solution to this component can be divided into three sub-parts:
-
On line 4 - 6, a
celery
queue is created that hasredis
as backend and broker. We also start theredis
server in the background using commandredis-server --daemonize yes
. -
On line 8 - 15, a
celery
task
is defined. This includes the declaration of the task@app.task()
followed by the function definition that workers would be ...