CSV Data Download Script

User
Assistant
Handling more than 50,000 records at once can lead to memory issues, as PHP is not optimized for large data processing. To efficiently process such a large amount of data, we need to implement batch processing. Here's an updated version of your script that processes records in batches and streams the output to avoid memory overflow:

<?php

/* Template Name: Download Data */

global $wpdb;

$sqlcon = '';
$sqlj = '';

if (isset($_GET['pagen'])) {
    $pageNum = $_GET['pagen'];
} else {
    $pageNum = 1;
}

$batchSize = 5000; // Process 5000 records at a time
$offset = ($pageNum - 1) * $batchSize;

if (!empty($_GET['catex'])) {
    if ($_GET['catex'] != -1) {
        $sqlcon = " 
        AND tt.taxonomy = 'category'
        AND t.term_id = '" . $_GET['catex'] . "'";
    } else {
        $sqlcon = " 
        AND tt.taxonomy = 'category'
        AND t.term_id != '65'"; // All except plant name
    }

    $sqlj = "
    JOIN $wpdb->term_relationships tr ON (p.ID = tr.object_id)
    JOIN $wpdb->term_taxonomy tt ON (tr.term_taxonomy_id = tt.term_taxonomy_id)
    JOIN $wpdb->terms t ON (tt.term_id = t.term_id)";
}

// Set headers to stream the file
header('Content-Type: application/csv');
header('Content-Disposition: attachment; filename="post_report_data.csv";');
header('Pragma: no-cache');
header('Expires: 0');

// Open the output stream
$fp = fopen('php://output', 'w');

// Write the CSV headers
fputcsv($fp, ['Title', 'Url', 'Feature Image Url', 'Category', 'Revision Date', 'Meta Description']);

while (true) {
    $querystr = "SELECT 
        ID, post_title AS title, post_modified 
        FROM $wpdb->posts p 
        $sqlj
        WHERE p.post_type='post' AND p.post_status='publish' $sqlcon 
        ORDER BY post_date DESC 
        LIMIT $offset, $batchSize";

    $pageposts = $wpdb->get_results($querystr, OBJECT);

    if (empty($pageposts)) {
        break; // Stop if no more records
    }

    foreach ($pageposts as $post) {
        $url = wp_get_attachment_url(get_post_thumbnail_id($post->ID), 'thumbnail');
        $categories = implode(', ', array_map(function ($cat) {
            return $cat->cat_name;
        }, get_the_category($post->ID)));
        $revision_date = get_the_modified_date('Y-m-d H:i:s', $post->ID);
        $meta_desc = get_post_meta($post->ID, '_yoast_wpseo_metadesc', true);

        fputcsv($fp, [
            $post->title,
            get_permalink($post->ID),
            $url,
            $categories,
            $revision_date,
            $meta_desc
        ]);
    }

    // Increment the offset for the next batch
    $offset += $batchSize;

    // Flush the output buffer to send data to the browser
    ob_flush();
    flush();
}

// Close the output stream
fclose($fp);

exit;

### Key Changes and Optimizations:
1. **Batch Processing**:
– Fetches and processes records in batches of 5000.
– Prevents memory exhaustion by processing only a small number of records at a time.

2. **Streaming Output**:
– Data is written directly to the output stream (`php://output`) to avoid memory issues.

3. **Header Setup**:
– Ensures the browser treats the response as a downloadable CSV file.

4. **Efficient Memory Management**:
– By flushing the output buffer after each batch, the script ensures smooth execution and avoids PHP memory limits.

5. **Pagination or Filtering**:
– You can control the batch size via `$batchSize` or modify the query parameters to target specific subsets of data.

This script should handle large datasets efficiently, even with over 50,000 records.

Leave a Comment

Your email address will not be published. Required fields are marked *