Exporting Large WordPress Data
January 28, 2025 · AI Conversations
### Key Adjustments:
1. **Chunk Processing:** Instead of loading all data into memory, process the data in chunks using pagination.
2. **Memory and Time Limits:** Increase PHP's memory and execution time to handle large exports.
3. **Efficient Query:** Use `LIMIT` in the SQL query to process a specific number of rows per batch.
### Updated Code:
<?php
/* Template Name: Download Data */
global $wpdb;
// Adjust memory and execution limits for large exports
ini_set('memory_limit', '512M'); // Adjust as needed
set_time_limit(0);
$sqlcon = '';
$sqlj = '';
$pageNum = isset($_GET['pagen']) ? intval($_GET['pagen']) : 1; // Current page number
$chunkSize = 500; // Number of rows per chunk
$previousRows = ($pageNum - 1) * $chunkSize;
$totalRecord = "";
$dataA = [['Title', 'Url', 'Feature Image Url', 'Category', 'Revision Date', 'Meta Description']];
// Handle category filtering
if (!empty($_GET['catex'])) {
if ($_GET['catex'] != -1) {
$sqlcon = "
AND tt.taxonomy = 'category'
AND t.term_id = '" . intval($_GET['catex']) . "'";
} else {
$sqlcon = "
AND tt.taxonomy = 'category'
AND t.term_id != '65'"; // Exclude specific category
}
$sqlj = "
JOIN $wpdb->term_relationships tr ON (p.ID = tr.object_id)
JOIN $wpdb->term_taxonomy tt ON (tr.term_taxonomy_id = tt.term_taxonomy_id)
JOIN $wpdb->terms t ON (tt.term_id = t.term_id)";
}
// Query posts in chunks
$querystr = "
SELECT
ID, post_title AS title, post_modified
FROM $wpdb->posts p
$sqlj
WHERE p.post_type = 'post'
AND p.post_status = 'publish'
$sqlcon
ORDER BY post_date DESC
LIMIT $previousRows, $chunkSize";
$pageposts = $wpdb->get_results($querystr, OBJECT);
// Process posts
foreach ($pageposts as $post) {
$url = wp_get_attachment_url(get_post_thumbnail_id($post->ID), 'thumbnail');
$categories = implode(', ', array_map(function ($cat) {
return $cat->cat_name;
}, get_the_category($post->ID)));
$dataA[] = [
$post->title,
get_permalink($post->ID),
$url,
$categories,
get_the_modified_date('Y-m-d', $post->ID),
get_post_meta($post->ID, '_yoast_wpseo_metadesc', true)
];
}
// Convert to CSV and handle download
convert_to_csv($dataA, 'post_report_data.csv', ',');
function convert_to_csv($input_array, $output_file_name, $delimiter)
{
$f = fopen('php://memory', 'w');
foreach ($input_array as $line) {
fputcsv($f, $line, $delimiter);
}
fseek($f, 0);
header('Content-Type: application/csv');
header('Content-Disposition: attachment; filename="' . $output_file_name . '";');
fpassthru($f);
}
### Explanation of Changes:
1. **Pagination with `LIMIT`:** The query uses `LIMIT $previousRows, $chunkSize` to retrieve a specific range of rows. Adjust `$chunkSize` to control the number of rows processed per page.
2. **Memory Management:** Increased `memory_limit` and disabled the execution time limit (`set_time_limit(0)`).
3. **Efficient Metadata Access:** Used `get_post_meta` and `get_the_category` to minimize unnecessary database queries.
### Handling 50,000+ Rows:
– **Iterate Over Pages:** Trigger this script with different `pagen` values (e.g., via `?pagen=1`, `?pagen=2`).
– **Combine Results:** Each page downloads its respective chunk of data as a CSV file. You can combine these files manually if needed or automate it later.
Alternatively, for a complete export without manual interaction, consider writing all chunks to a single temporary file and serving that file once the export is complete.
The webpage at https://www.backyardgardener.com/download-csv/?pagen=1&$previousRows=0&catex=65 might be temporarily down or it may have moved permanently to a new web address.
ERR_INVALID_RESPONSE
Here are steps to resolve this issue:
—
### 1. **Optimize the Script**
– Remove unnecessary code.
– Ensure the script only processes the requested chunk of data.
—
### 2. **Debug Memory and Execution Limits**
Update `wp-config.php` to temporarily increase server resources:
define('WP_MEMORY_LIMIT', '512M'); // Increase PHP memory limit
set_time_limit(0); // Remove execution time limits
ini_set('output_buffering', 'off'); // Prevent buffering issues
ini_set('zlib.output_compression', 'Off'); // Disable compression
—
### 3. **Output Data in Real-Time**
Send the CSV data in chunks as it’s being generated. Replace the `convert_to_csv` function with a real-time output approach:
function convert_to_csv($input_array, $output_file_name, $delimiter)
{
// Disable output buffering
ob_start();
header('Content-Type: text/csv');
header('Content-Disposition: attachment; filename="' . $output_file_name . '";');
$f = fopen('php://output', 'w');
foreach ($input_array as $line) {
fputcsv($f, $line, $delimiter);
}
fclose($f);
ob_flush();
}
—
### 4. **Paginate the Output**
Modify the URL parameter to export data in smaller batches (`chunkSize`) and download each batch separately. For example:
https://www.backyardgardener.com/download-csv/?pagen=1&catex=65
https://www.backyardgardener.com/download-csv/?pagen=2&catex=65
Each URL should handle one chunk of data (`LIMIT`) based on the `pagen` parameter.
—
### 5. **Use WP CLI for Large Exports**
If the dataset is too large for a web request, switch to using WP CLI for server-side processing. Create a custom WP CLI command:
if (defined('WP_CLI') && WP_CLI) {
WP_CLI::add_command('export_posts', 'export_posts_to_csv');
}
function export_posts_to_csv($args, $assoc_args) {
global $wpdb;
$chunkSize = 500; // Number of rows per batch
$page = 1;
$output_file = 'export.csv';
$f = fopen($output_file, 'w');
fputcsv($f, ['Title', 'Url', 'Feature Image Url', 'Category', 'Revision Date', 'Meta Description']);
do {
$offset = ($page - 1) * $chunkSize;
$query = "
SELECT ID, post_title, post_modified
FROM {$wpdb->posts}
WHERE post_type = 'post' AND post_status = 'publish'
ORDER BY post_date DESC
LIMIT $offset, $chunkSize";
$results = $wpdb->get_results($query);
foreach ($results as $post) {
$url = wp_get_attachment_url(get_post_thumbnail_id($post->ID));
$categories = implode(', ', array_map(function ($cat) {
return $cat->cat_name;
}, get_the_category($post->ID)));
fputcsv($f, [
$post->post_title,
get_permalink($post->ID),
$url,
$categories,
get_the_modified_date('Y-m-d', $post->ID),
get_post_meta($post->ID, '_yoast_wpseo_metadesc', true),
]);
}
$page++;
} while (count($results) === $chunkSize);
fclose($f);
WP_CLI::success('Export completed: ' . $output_file);
}
**Run the Command**:
wp export_posts
This avoids memory or timeout limits as the CLI environment can handle much larger exports.
—
### 6. **Asynchronous Exports**
For very large datasets:
– Use `wp_schedule_event` or custom AJAX requests to export data incrementally.
– Store progress in a temporary file or database table, then generate the full CSV when all chunks are processed.
—
### 7. **Server Logs**
If the issue persists, check server logs (e.g., Apache or Nginx error logs) to identify:
– PHP errors (memory exhaustion, timeouts).
– Server limits (connection drops, request size limits).
—
By implementing these changes, you should avoid the `ERR_INVALID_RESPONSE` issue and handle large dataset exports efficiently.
### Improvements and Fixes
#### 1. **Memory and Timeout Handling**
You're already increasing memory and execution limits at the top of the script. This is good, but ensure the server allows these overrides.
#### 2. **Dynamic Pagination**
Since you're fetching posts in batches using `$previousRows` and `LIMIT`, you should uncomment the `LIMIT` clause in your SQL query:
$totalRecord = "LIMIT $previousRows, 500"; // Fetch 500 rows per page
This ensures the query only fetches a manageable number of records.
—
#### 3. **Sanitize Input**
Always sanitize user input to prevent SQL injection or unexpected behavior. Use `intval()` or `sanitize_text_field()`:
$pageNum = intval($_GET['pagen'] ?? 1);
$catex = intval($_GET['catex'] ?? -1);
—
#### 4. **Output Buffering**
Prevent buffering issues by flushing output periodically while writing to the CSV. Add this inside your `foreach` loop:
ob_flush();
flush();
—
#### 5. **Error Handling**
Add error handling to gracefully manage empty results or query issues:
if (empty($pageposts)) {
die('No posts found for the given criteria.');
}
—
#### 6. **Adjust Headers**
Ensure headers are set correctly to prevent browser caching or improper handling:
header('Content-Type: text/csv; charset=utf-8');
header('Content-Disposition: attachment; filename="post_report_data.csv"');
header('Pragma: no-cache');
header('Expires: 0');
—
### Updated Code
Here’s the optimized version of your script:
<?php
/* Template Name: Download Data */
define('WP_MEMORY_LIMIT', '512M'); // Increase PHP memory limit
set_time_limit(0); // Remove execution time limits
ini_set('output_buffering', 'off'); // Prevent buffering issues
ini_set('zlib.output_compression', 'Off'); // Disable compression
global $wpdb;
$pageNum = intval($_GET['pagen'] ?? 1);
$catex = intval($_GET['catex'] ?? -1);
$previousRows = ($pageNum - 1) * 500;
$sqlcon = '';
$sqlj = '';
$totalRecord = "LIMIT $previousRows, 500";
if ($catex !== -1) {
$sqlcon = "
AND tt.taxonomy = 'category'
AND t.term_id = '{$catex}'";
} else {
$sqlcon = "
AND tt.taxonomy = 'category'
AND t.term_id != '65'";
}
$sqlj = "
JOIN $wpdb->term_relationships tr ON (p.ID = tr.object_id)
JOIN $wpdb->term_taxonomy tt ON (tr.term_taxonomy_id = tt.term_taxonomy_id)
JOIN $wpdb->terms t ON (tt.term_id = t.term_id)";
$querystr = "SELECT
ID, post_title AS title, post_modified
FROM $wpdb->posts p
$sqlj
WHERE p.post_type = 'post' AND p.post_status = 'publish'
$sqlcon
ORDER BY post_date DESC
$totalRecord";
$pageposts = $wpdb->get_results($querystr, OBJECT);
if (empty($pageposts)) {
die('No posts found for the given criteria.');
}
// Prepare the output for CSV
header('Content-Type: text/csv; charset=utf-8');
header('Content-Disposition: attachment; filename="post_report_data.csv"');
header('Pragma: no-cache');
header('Expires: 0');
// Write CSV output
$output = fopen('php://output', 'w');
// Add CSV header
fputcsv($output, ['Title', 'Url', 'Feature Image Url', 'Category', 'Revision Date', 'Meta Description']);
// Fetch and write each post's data directly
foreach ($pageposts as $post) {
$url = wp_get_attachment_url(get_post_thumbnail_id($post->ID), 'thumbnail');
$categories = implode(', ', array_map(function ($cat) {
return $cat->cat_name;
}, get_the_category($post->ID)));
fputcsv($output, [
$post->title,
get_permalink($post->ID),
$url,
$categories,
get_the_modified_date('Y-m-d H:i:s', $post->ID),
get_post_meta($post->ID, '_yoast_wpseo_metadesc', true)
]);
// Flush output to avoid memory issues
ob_flush();
flush();
}
fclose($output);
exit;
—
### Key Changes
1. **Pagination with `LIMIT`** ensures that only 500 rows are fetched per request.
2. **Input sanitization** (`intval`) avoids SQL injection.
3. **Flush inside loop** reduces memory usage and avoids buffering issues.
4. **Error handling** for empty results ensures user feedback.
—
### Testing the Script
– Test with different `pagen` and `catex` values in the URL.
– Ensure the script handles large datasets by testing with 10,000+ records.
– Monitor server logs to ensure no memory or timeout errors occur.
Let me know if you encounter any other issues!