Scrapy Pipeline

The Item Pipeline processes and saves data after the Spider yields it. Each item passes through the pipeline stages you define.

Enable a Pipeline

In settings.py, uncomment and configure ITEM_PIPELINES:

# Enable a pipeline (uncomment and set priority)
ITEM_PIPELINES = {
    "my_crawler.pipelines.MyCrawlerPipeline": 300,
}

The number (300) is the priority — lower numbers run first. Typical range is 1–1000.

Write a Pipeline Class

Edit pipelines.py:

class MyCrawlerPipeline:
    def process_item(self, item, spider):
        # Process or save the item here
        return item

process_item() is called for every item yielded by the spider. You must return the item (or raise DropItem to discard it).

Example: Save to CSV

import csv

class CsvPipeline:
    def open_spider(self, spider):
        self.file = open("output.csv", "w", newline="", encoding="utf-8")
        self.writer = csv.DictWriter(self.file, fieldnames=["text", "author"])
        self.writer.writeheader()

    def close_spider(self, spider):
        self.file.close()

    def process_item(self, item, spider):
        self.writer.writerow(item)
        return item

ITEM_PIPELINES = {
    "my_crawler.pipelines.CsvPipeline": 300,
}

Multiple Pipelines

You can register multiple pipelines — each item passes through them in priority order (lower number first).

ITEM_PIPELINES = {
    "my_crawler.pipelines.ValidatePipeline": 100,  # runs first
    "my_crawler.pipelines.CsvPipeline": 300,        # runs second
    "my_crawler.pipelines.DatabasePipeline": 500,   # runs last
}

Each pipeline class is independent. A common pattern:

Priority	Pipeline	Purpose
100	`ValidatePipeline`	Clean or drop invalid items
300	`CsvPipeline`	Save to CSV file
500	`DatabasePipeline`	Save to database

If a pipeline raises DropItem, the item stops and does not pass to the next pipeline.

Enable a Pipeline​

Write a Pipeline Class​

Example: Save to CSV​

Multiple Pipelines​

Enable a Pipeline

Write a Pipeline Class

Example: Save to CSV

Multiple Pipelines