PHP CSV Memory Issues
January 28, 2025 · AI Conversations
—
### **1. Increase PHP Memory Limit**
Increase the memory limit in your `php.ini` file:
memory_limit = 512M
You can also set it dynamically in your script:
ini_set('memory_limit', '512M'); // Adjust as needed
—
### **2. Use Output Buffering**
Instead of creating the entire CSV in memory, write it directly to the output stream to reduce memory usage:
header('Content-Type: text/csv');
header('Content-Disposition: attachment; filename="large_file.csv"');
$output = fopen('php://output', 'w');
// Add headers
fputcsv($output, ['Column1', 'Column2', 'Column3']);
// Fetch rows and write to CSV
while ($row = fetchLargeDataChunk()) { // Replace with your data fetch logic
fputcsv($output, $row);
}
fclose($output);
exit;
—
### **3. Use Chunked Data Fetching**
Avoid loading all data into memory at once. Fetch data in smaller chunks using database cursors:
$pdo = new PDO(...); // Your database connection
$stmt = $pdo->prepare('SELECT * FROM large_table');
$stmt->execute();
header('Content-Type: text/csv');
header('Content-Disposition: attachment; filename="large_file.csv"');
$output = fopen('php://output', 'w');
fputcsv($output, ['Column1', 'Column2', 'Column3']);
while ($row = $stmt->fetch(PDO::FETCH_ASSOC)) {
fputcsv($output, $row);
}
fclose($output);
exit;
—
### **4. Disable Unnecessary Features**
Disable any unnecessary PHP settings that consume memory:
ini_set('zlib.output_compression', 'Off');
—
### **5. Use a Temporary File**
Write the CSV to a temporary file instead of keeping it in memory:
$tempFile = tmpfile();
fputcsv($tempFile, ['Column1', 'Column2', 'Column3']);
// Add rows
while ($row = fetchLargeDataChunk()) {
fputcsv($tempFile, $row);
}
// Stream the file to the browser
header('Content-Type: text/csv');
header('Content-Disposition: attachment; filename="large_file.csv"');
rewind($tempFile);
fpassthru($tempFile);
fclose($tempFile);
exit;
—
### **6. Use CLI for Large Exports**
If possible, handle large CSV exports using a PHP script executed via CLI, which has fewer memory constraints:
php export_csv.php
—
### **7. Optimize Database Queries**
Fetch only the required columns and limit the number of records fetched at a time:
SELECT column1, column2, column3 FROM large_table LIMIT 1000 OFFSET 0;
—
### **8. Compress the Output**
Compress the CSV file to reduce memory usage:
header('Content-Encoding: gzip');
header('Content-Type: application/csv');
header('Content-Disposition: attachment; filename="large_file.csv.gz"');
$output = gzopen('php://output', 'w');
gzwrite($output, 'Column1,Column2,Column3' . PHP_EOL);
while ($row = fetchLargeDataChunk()) {
gzwrite($output, implode(',', $row) . PHP_EOL);
}
gzclose($output);
exit;
—
If you share more about your setup or how you're generating the CSV, I can provide a more specific solution.
—
### Key Improvements:
1. **Avoid Loading Everything in Memory**:
Your script loads all rows into an array before converting them to CSV. This approach consumes excessive memory for large datasets. Instead, write data directly to the output buffer in chunks.
2. **Stream Data Instead of Using Memory**:
Use `php://output` instead of memory-based CSV creation. Here’s an updated version of your script:
—
### Optimized Code:
<?php
/* Template Name: Download Data */
global $wpdb;
// Pagination and conditions
$pageNum = isset($_GET['pagen']) ? (int)$_GET['pagen'] : 1;
$previousRows = ($pageNum - 1) * 500;
$totalRecord = '';
$sqlcon = '';
$sqlj = '';
if (!empty($_GET['catex'])) {
if ($_GET['catex'] != -1) {
$sqlcon = "
AND tt.taxonomy = 'category'
AND t.term_id = '" . $_GET['catex'] . "'";
$totalRecord = "LIMIT $previousRows, 2000";
} else {
$sqlcon = "
AND tt.taxonomy = 'category'
AND t.term_id != '65'"; // All except category 65
}
$sqlj = "
JOIN $wpdb->term_relationships tr ON (p.ID = tr.object_id)
JOIN $wpdb->term_taxonomy tt ON (tr.term_taxonomy_id = tt.term_taxonomy_id)
JOIN $wpdb->terms t ON (tt.term_id = t.term_id)";
}
// Fetch posts query
$querystr = "SELECT
p.ID, p.post_title AS title, p.post_modified
FROM $wpdb->posts p
$sqlj
WHERE p.post_type = 'post' AND p.post_status = 'publish' $sqlcon
ORDER BY p.post_date DESC
$totalRecord";
// Header for CSV download
header('Content-Type: text/csv');
header('Content-Disposition: attachment; filename="post_report_data.csv"');
// Open output stream
$output = fopen('php://output', 'w');
// Add CSV headers
fputcsv($output, ['Title', 'Url', 'Feature Image Url', 'Category', 'Revision Date', 'Meta Description']);
// Fetch posts in chunks and write to CSV
$pageposts = $wpdb->get_results($querystr, OBJECT);
foreach ($pageposts as $post) {
$url = wp_get_attachment_url(get_post_thumbnail_id($post->ID), 'thumbnail');
$categories = implode(', ', array_map(function ($cat) {
return $cat->cat_name;
}, get_the_category($post->ID)));
$modifiedDate = get_the_modified_date('Y-m-d', $post->ID);
$metaDescription = get_post_meta($post->ID, '_yoast_wpseo_metadesc', true);
fputcsv($output, [
$post->title,
get_permalink($post->ID),
$url,
$categories,
$modifiedDate,
$metaDescription
]);
}
// Close output stream
fclose($output);
exit;
—
### Improvements Explained:
1. **Streaming with `php://output`**:
– The `fputcsv()` directly writes each row to the output stream instead of storing it in memory.
2. **Pagination**:
– The script fetches only 2000 records at a time, controlled by `LIMIT` in the SQL query, preventing memory exhaustion.
3. **Efficient Looping**:
– By fetching and processing records in chunks, memory usage stays low.
4. **Avoiding Excessive Memory Consumption**:
– Removed the `$dataA` array, which stored all rows in memory.
5. **Dynamic Headers**:
– Proper headers are set for downloading the CSV file.
—
### Additional Recommendations:
1. **Test for Large Datasets**:
If your dataset is huge (e.g., >100,000 rows), increase PHP’s memory and execution time limits:
ini_set('memory_limit', '512M'); // or higher, as needed
ini_set('max_execution_time', '300'); // 5 minutes
2. **Debugging Errors**:
Enable error logging in WordPress to debug issues:
define('WP_DEBUG', true);
define('WP_DEBUG_LOG', true);
3. **Database Indexing**:
Ensure columns used in `JOIN` and `WHERE` conditions (e.g., `term_id`, `post_date`) are indexed for faster queries.
Let me know if you face any issues with the implementation.