...

/

Project Solution Review

Project Solution Review

Get the solution of all components of the project "Web Crawler" with a detailed explanation.

Solution: Add URLs to queue

In this component, you were required to create a celery queue and add tasks to the queue so that they can be fetched later by a worker for processing.

Following is the solution:

Click on the Run button to execute the solution code provided below.

import celery
import requests

app = celery.Celery('celery-proj',
                    broker='redis://localhost',
                    backend='redis://localhost')

@app.task()
def getURL(url_to_crawl):
    dic = {}
    r = requests.get(url = url_to_crawl)
    text = r.text
    dic['data'] = text
    dic['status_code'] = r.status_code
    return dic

if __name__ == '__main__':
    urls = ["http://educative.io", "http://example.org/", "http://example.com"]

    results = []
    for url in urls:
        results.append( getURL.delay(url))

    for result in results:
        print("Task state: %s" % result.state)
Solution: Create celery queue and add tasks (URLs to fetch)

Solution explanation

The solution to this component can be divided into three sub-parts:

  1. On line 4 - 6, a celery queue is created that has redis as backend and broker. We also start the redis server in the background using command redis-server --daemonize yes.

  2. On line 8 - 15, a celery task is defined. This includes the declaration of the task @app.task() followed by the function definition that workers would be ...