...

Project Solution Review

Get the solution of all components of the project "Web Crawler" with a detailed explanation.

We'll cover the following...

- Solution: Add URLs to queue
- - Solution explanation
- Solution: Add workers
- - Solution explanation
- Solution: Add failure handling
- - Solution explanation

import celery
import requests

app = celery.Celery('celery-proj',
                    broker='redis://localhost',
                    backend='redis://localhost')

@app.task()
def getURL(url_to_crawl):
    dic = {}
    r = requests.get(url = url_to_crawl)
    text = r.text
    dic['data'] = text
    dic['status_code'] = r.status_code
    return dic

if __name__ == '__main__':
    urls = ["http://educative.io", "http://example.org/", "http://example.com"]

    results = []
    for url in urls:
        results.append( getURL.delay(url))

    for result in results:
        print("Task state: %s" % result.state)

Solution: Create celery queue and add tasks (URLs to fetch)

Scaling

CPU Scaling

Event Loops

Functional Programming

Queue-Based Distribution

Designing for Failure

Create a Web Crawler

Project Walkthrough

Lock Management

Group Membership

REST Interfaces

Build a REST API in Flask

Deploying on PaaS

Testing Distributed Systems

Caching

Performance

Conclusion

Project Solution Review

Solution: Add URLs to queue

Solution explanation