CSV Download Optimization
January 28, 2025 ยท AI Conversations
### Adjustments to Ensure Large Record Sets Are Processed
1. **Pagination**: Fetch records in smaller batches to prevent memory exhaustion.
2. **Remove `ob_flush` and `flush`**: These are not necessary here and may not work well with CSV output.
3. **Optimize SQL Query**: Ensure your SQL query fetches only the required data.
### Updated Code
Below is a modified version of your script with better handling for large datasets:
<?php
/* Template Name: Download Data */
define('WP_MEMORY_LIMIT', '1024M'); // Increase PHP memory limit
set_time_limit(0); // Remove execution time limits
ini_set('output_buffering', 'off'); // Prevent buffering issues
ini_set('zlib.output_compression', 'Off'); // Disable compression
global $wpdb;
// Get pagination and category parameters
$pageNum = intval($_GET['pagen'] ?? 1);
$catex = intval($_GET['catex'] ?? -1);
$batchSize = 500; // Process records in batches of 500
$offset = ($pageNum - 1) * $batchSize;
$sqlcon = '';
$sqlj = '';
if ($catex !== -1) {
$sqlcon = "
AND tt.taxonomy = 'category'
AND t.term_id = '{$catex}'";
} else {
$sqlcon = "
AND tt.taxonomy = 'category'
AND t.term_id != '65'";
}
$sqlj = "
JOIN $wpdb->term_relationships tr ON (p.ID = tr.object_id)
JOIN $wpdb->term_taxonomy tt ON (tr.term_taxonomy_id = tt.term_taxonomy_id)
JOIN $wpdb->terms t ON (tt.term_id = t.term_id)";
// Query total record count
$totalQuery = "SELECT COUNT(DISTINCT p.ID) FROM $wpdb->posts p $sqlj
WHERE p.post_type = 'post' AND p.post_status = 'publish' $sqlcon";
$totalRecords = $wpdb->get_var($totalQuery);
// Fetch the data in batches
$querystr = "SELECT
DISTINCT p.ID, p.post_title AS title, p.post_modified
FROM $wpdb->posts p
$sqlj
WHERE p.post_type = 'post' AND p.post_status = 'publish'
$sqlcon
ORDER BY p.post_date DESC
LIMIT $batchSize OFFSET $offset";
$pageposts = $wpdb->get_results($querystr, OBJECT);
// Prepare output for CSV
header('Content-Type: text/csv; charset=utf-8');
header('Content-Disposition: attachment; filename="post_report_data_page_' . $pageNum . '.csv"');
header('Pragma: no-cache');
header('Expires: 0');
// Write CSV output
$output = fopen('php://output', 'w');
// Add CSV header
if ($pageNum === 1) {
fputcsv($output, ['Title', 'Url', 'Feature Image Url', 'Category', 'Revision Date', 'Meta Description']);
}
// Write each post's data
foreach ($pageposts as $post) {
$url = wp_get_attachment_url(get_post_thumbnail_id($post->ID), 'thumbnail');
$categories = implode(', ', array_map(function ($cat) {
return $cat->cat_name;
}, get_the_category($post->ID)));
fputcsv($output, [
$post->title,
get_permalink($post->ID),
$url,
$categories,
get_the_modified_date('Y-m-d H:i:s', $post->ID),
get_post_meta($post->ID, '_yoast_wpseo_metadesc', true)
]);
}
fclose($output);
// Pagination message
if ($offset + $batchSize < $totalRecords) {
echo "Processed batch {$pageNum}. To download the next set, increment the 'pagen' parameter.";
} else {
echo "All records processed.";
}
exit;
### Key Changes:
1. **Batch Processing**:
– Records are fetched 500 at a time using `LIMIT $batchSize OFFSET $offset`.
– To download the next batch, increment the `pagen` query parameter (`?pagen=2`, `?pagen=3`, etc.).
2. **Dynamic Total Records**:
– Total records are fetched using a `COUNT` query to help with pagination.
3. **CSV Output**:
– Each batch generates a separate CSV file (`post_report_data_page_1.csv`, `post_report_data_page_2.csv`, etc.).
4. **Memory Efficiency**:
– Large datasets are handled incrementally, reducing the chance of memory exhaustion.
### How to Use:
– Run the script with `?pagen=1` to start downloading the first batch.
– Increment `pagen` in the query string (`?pagen=2`, `?pagen=3`, etc.) to fetch subsequent batches.
This approach ensures all records can be downloaded without overwhelming the server.
### Updated Code for Downloading All Records at Once
<?php
/* Template Name: Download Data */
// Increase memory and execution time limits
define('WP_MEMORY_LIMIT', '1024M'); // Increase PHP memory limit
set_time_limit(0); // Remove execution time limits
ini_set('output_buffering', 'off'); // Prevent buffering issues
ini_set('zlib.output_compression', 'Off'); // Disable compression
global $wpdb;
// Get category filter (if any)
$catex = intval($_GET['catex'] ?? -1);
$sqlcon = '';
$sqlj = '';
if ($catex !== -1) {
$sqlcon = "
AND tt.taxonomy = 'category'
AND t.term_id = '{$catex}'";
} else {
$sqlcon = "
AND tt.taxonomy = 'category'
AND t.term_id != '65'";
}
$sqlj = "
JOIN $wpdb->term_relationships tr ON (p.ID = tr.object_id)
JOIN $wpdb->term_taxonomy tt ON (tr.term_taxonomy_id = tt.term_taxonomy_id)
JOIN $wpdb->terms t ON (tt.term_id = t.term_id)";
// Query all records
$querystr = "SELECT
DISTINCT p.ID, p.post_title AS title, p.post_modified
FROM $wpdb->posts p
$sqlj
WHERE p.post_type = 'post' AND p.post_status = 'publish'
$sqlcon
ORDER BY p.post_date DESC";
$pageposts = $wpdb->get_results($querystr, OBJECT);
if (empty($pageposts)) {
die('No posts found for the given criteria.');
}
// Prepare output for CSV
header('Content-Type: text/csv; charset=utf-8');
header('Content-Disposition: attachment; filename="post_report_data.csv"');
header('Pragma: no-cache');
header('Expires: 0');
// Write CSV output
$output = fopen('php://output', 'w');
// Add CSV header
fputcsv($output, ['Title', 'Url', 'Feature Image Url', 'Category', 'Revision Date', 'Meta Description']);
// Fetch and write each post's data directly
foreach ($pageposts as $post) {
// Fetch the URL for the feature image
$url = wp_get_attachment_url(get_post_thumbnail_id($post->ID), 'thumbnail');
// Fetch categories
$categories = implode(', ', array_map(function ($cat) {
return $cat->cat_name;
}, get_the_category($post->ID)));
// Write data to CSV
fputcsv($output, [
$post->title,
get_permalink($post->ID),
$url,
$categories,
get_the_modified_date('Y-m-d H:i:s', $post->ID),
get_post_meta($post->ID, '_yoast_wpseo_metadesc', true)
]);
// Flush output buffer to avoid memory issues
ob_flush();
flush();
}
// Close CSV output
fclose($output);
exit;
### Key Changes in the Code
1. **No Pagination**: The query fetches all records at once, removing the need for pagination (`LIMIT` and `OFFSET` are not used).
2. **Increased Memory and Time Limits**:
– PHP memory limit is set to `1024M` (1 GB) using `define('WP_MEMORY_LIMIT', '1024M');`.
– Execution time is unlimited with `set_time_limit(0)`.
3. **Efficient Buffer Handling**:
– `ob_flush()` and `flush()` are called within the loop to immediately send data to the client, avoiding memory buildup.
4. **Direct Output to CSV**:
– Data is streamed directly to the browser, so no large dataset is held in memory.
5. **Meta Description Handling**:
– `_yoast_wpseo_metadesc` metadata is included for SEO purposes.
### Notes:
– **Performance Considerations**:
– This script will work for up to 32,000 records if your server has sufficient resources. For significantly larger datasets, you might still need to use batch processing.
– **Server Resources**:
– Ensure your server has adequate memory (RAM) and processing power. If using shared hosting, this script might time out or crash due to server limits.
– **Test Before Full Execution**:
– Test the script with smaller datasets to confirm its functionality before running it for the entire record set.
Let me know if you face any issues or need further assistance!