CSV Download Optimization

January 28, 2025 · AI Conversations

User

Assistant

The issue with your code not downloading all 32,000 records lies in how PHP handles memory, execution limits, and output buffering for large datasets. Here's how to address it:

### Adjustments to Ensure Large Record Sets Are Processed
1. **Pagination**: Fetch records in smaller batches to prevent memory exhaustion.
2. **Remove `ob_flush` and `flush`**: These are not necessary here and may not work well with CSV output.
3. **Optimize SQL Query**: Ensure your SQL query fetches only the required data.

### Updated Code
Below is a modified version of your script with better handling for large datasets:

<?php

/* Template Name: Download Data */

define('WP_MEMORY_LIMIT', '1024M'); // Increase PHP memory limit
set_time_limit(0); // Remove execution time limits
ini_set('output_buffering', 'off'); // Prevent buffering issues
ini_set('zlib.output_compression', 'Off'); // Disable compression
global $wpdb;

// Get pagination and category parameters
$pageNum = intval($_GET['pagen'] ?? 1);
$catex = intval($_GET['catex'] ?? -1);
$batchSize = 500; // Process records in batches of 500
$offset = ($pageNum - 1) * $batchSize;

$sqlcon = '';
$sqlj = '';

if ($catex !== -1) {
    $sqlcon = "
        AND tt.taxonomy = 'category'
        AND t.term_id = '{$catex}'";
} else {
    $sqlcon = "
        AND tt.taxonomy = 'category'
        AND t.term_id != '65'";
}

$sqlj = "
    JOIN $wpdb->term_relationships tr ON (p.ID = tr.object_id)
    JOIN $wpdb->term_taxonomy tt ON (tr.term_taxonomy_id = tt.term_taxonomy_id)
    JOIN $wpdb->terms t ON (tt.term_id = t.term_id)";

// Query total record count
$totalQuery = "SELECT COUNT(DISTINCT p.ID) FROM $wpdb->posts p $sqlj 
               WHERE p.post_type = 'post' AND p.post_status = 'publish' $sqlcon";
$totalRecords = $wpdb->get_var($totalQuery);

// Fetch the data in batches
$querystr = "SELECT 
    DISTINCT p.ID, p.post_title AS title, p.post_modified 
    FROM $wpdb->posts p  
    $sqlj 
    WHERE p.post_type = 'post' AND p.post_status = 'publish' 
    $sqlcon 
    ORDER BY p.post_date DESC 
    LIMIT $batchSize OFFSET $offset";

$pageposts = $wpdb->get_results($querystr, OBJECT);

// Prepare output for CSV
header('Content-Type: text/csv; charset=utf-8');
header('Content-Disposition: attachment; filename="post_report_data_page_' . $pageNum . '.csv"');
header('Pragma: no-cache');
header('Expires: 0');

// Write CSV output
$output = fopen('php://output', 'w');

// Add CSV header
if ($pageNum === 1) {
    fputcsv($output, ['Title', 'Url', 'Feature Image Url', 'Category', 'Revision Date', 'Meta Description']);
}

// Write each post's data
foreach ($pageposts as $post) {
    $url = wp_get_attachment_url(get_post_thumbnail_id($post->ID), 'thumbnail');
    $categories = implode(', ', array_map(function ($cat) {
        return $cat->cat_name;
    }, get_the_category($post->ID)));

    fputcsv($output, [
        $post->title,
        get_permalink($post->ID),
        $url,
        $categories,
        get_the_modified_date('Y-m-d H:i:s', $post->ID),
        get_post_meta($post->ID, '_yoast_wpseo_metadesc', true)
    ]);
}

fclose($output);

// Pagination message
if ($offset + $batchSize < $totalRecords) {
    echo "Processed batch {$pageNum}. To download the next set, increment the 'pagen' parameter.";
} else {
    echo "All records processed.";
}
exit;

### Key Changes:
1. **Batch Processing**:
– Records are fetched 500 at a time using `LIMIT $batchSize OFFSET $offset`.
– To download the next batch, increment the `pagen` query parameter (`?pagen=2`, `?pagen=3`, etc.).

2. **Dynamic Total Records**:
– Total records are fetched using a `COUNT` query to help with pagination.

3. **CSV Output**:
– Each batch generates a separate CSV file (`post_report_data_page_1.csv`, `post_report_data_page_2.csv`, etc.).

4. **Memory Efficiency**:
– Large datasets are handled incrementally, reducing the chance of memory exhaustion.

### How to Use:
– Run the script with `?pagen=1` to start downloading the first batch.
– Increment `pagen` in the query string (`?pagen=2`, `?pagen=3`, etc.) to fetch subsequent batches.

This approach ensures all records can be downloaded without overwhelming the server.

User

download record a tonce

Assistant

To download all 32,000 records at once, the key is to optimize memory and output handling. The primary issue lies in ensuring the server can handle such a large dataset without running into memory or timeout limits. Here's how you can adjust your code for a one-time download of all records:

### Updated Code for Downloading All Records at Once

<?php

/* Template Name: Download Data */

// Increase memory and execution time limits
define('WP_MEMORY_LIMIT', '1024M'); // Increase PHP memory limit
set_time_limit(0); // Remove execution time limits
ini_set('output_buffering', 'off'); // Prevent buffering issues
ini_set('zlib.output_compression', 'Off'); // Disable compression
global $wpdb;

// Get category filter (if any)
$catex = intval($_GET['catex'] ?? -1);

$sqlcon = '';
$sqlj = '';

if ($catex !== -1) {
    $sqlcon = "
        AND tt.taxonomy = 'category'
        AND t.term_id = '{$catex}'";
} else {
    $sqlcon = "
        AND tt.taxonomy = 'category'
        AND t.term_id != '65'";
}

$sqlj = "
    JOIN $wpdb->term_relationships tr ON (p.ID = tr.object_id)
    JOIN $wpdb->term_taxonomy tt ON (tr.term_taxonomy_id = tt.term_taxonomy_id)
    JOIN $wpdb->terms t ON (tt.term_id = t.term_id)";

// Query all records
$querystr = "SELECT 
    DISTINCT p.ID, p.post_title AS title, p.post_modified 
    FROM $wpdb->posts p  
    $sqlj 
    WHERE p.post_type = 'post' AND p.post_status = 'publish' 
    $sqlcon 
    ORDER BY p.post_date DESC";

$pageposts = $wpdb->get_results($querystr, OBJECT);

if (empty($pageposts)) {
    die('No posts found for the given criteria.');
}

// Prepare output for CSV
header('Content-Type: text/csv; charset=utf-8');
header('Content-Disposition: attachment; filename="post_report_data.csv"');
header('Pragma: no-cache');
header('Expires: 0');

// Write CSV output
$output = fopen('php://output', 'w');

// Add CSV header
fputcsv($output, ['Title', 'Url', 'Feature Image Url', 'Category', 'Revision Date', 'Meta Description']);

// Fetch and write each post's data directly
foreach ($pageposts as $post) {
    // Fetch the URL for the feature image
    $url = wp_get_attachment_url(get_post_thumbnail_id($post->ID), 'thumbnail');

    // Fetch categories
    $categories = implode(', ', array_map(function ($cat) {
        return $cat->cat_name;
    }, get_the_category($post->ID)));

    // Write data to CSV
    fputcsv($output, [
        $post->title,
        get_permalink($post->ID),
        $url,
        $categories,
        get_the_modified_date('Y-m-d H:i:s', $post->ID),
        get_post_meta($post->ID, '_yoast_wpseo_metadesc', true)
    ]);
    
    // Flush output buffer to avoid memory issues
    ob_flush();
    flush();
}

// Close CSV output
fclose($output);
exit;

### Key Changes in the Code
1. **No Pagination**: The query fetches all records at once, removing the need for pagination (`LIMIT` and `OFFSET` are not used).

2. **Increased Memory and Time Limits**:
– PHP memory limit is set to `1024M` (1 GB) using `define('WP_MEMORY_LIMIT', '1024M');`.
– Execution time is unlimited with `set_time_limit(0)`.

3. **Efficient Buffer Handling**:
– `ob_flush()` and `flush()` are called within the loop to immediately send data to the client, avoiding memory buildup.

4. **Direct Output to CSV**:
– Data is streamed directly to the browser, so no large dataset is held in memory.

5. **Meta Description Handling**:
– `_yoast_wpseo_metadesc` metadata is included for SEO purposes.

### Notes:
– **Performance Considerations**:
– This script will work for up to 32,000 records if your server has sufficient resources. For significantly larger datasets, you might still need to use batch processing.

– **Server Resources**:
– Ensure your server has adequate memory (RAM) and processing power. If using shared hosting, this script might time out or crash due to server limits.

– **Test Before Full Execution**:
– Test the script with smaller datasets to confirm its functionality before running it for the entire record set.

Let me know if you face any issues or need further assistance!

CSV Download Optimization

Leave a Comment Cancel