HTML to Text Conversion Help
February 25, 2025 Β· AI Conversations
### Solution:
You need to use `preg_replace_callback()` to properly handle the link list function.
#### **Fix the Link Replacement in `preg_replace_callback_array`**
Modify the `_convert()` method like this:
private function _convert(): void
{
$this->_link_count = 0;
$this->_link_list = '';
// Ensure $this->html is a string (fallback to an empty string if null)
$text = trim(stripslashes($this->html ?? ''));
$text = strip_tags($text, $this->allowed_tags);
$text = html_entity_decode($text);
// Use preg_replace_callback for link handling
$text = preg_replace_callback(
'/<a [^>]*href="([^"]+)"[^>]*>(.*?)</a>/i',
function ($matches) {
return $this->_build_link_list($matches[1], $matches[2]);
},
$text
);
// Use preg_replace for other replacements
$text = preg_replace($this->search, $this->replace, $text);
// Normalize newlines
$text = preg_replace("/ns+n/", "nn", $text);
$text = preg_replace("/[n]{3,}/", "nn", $text);
// Append links if there are any
if (!empty($this->_link_list)) {
$text .= "nnLinks:n" . $this->_link_list;
}
$this->text = $text;
$this->_converted = true;
}
#### **Fix the `_build_link_list()` method**
If `_build_link_list` is missing, add this method inside the `html2text` class:
private function _build_link_list($link, $display): string
{
$this->_link_count++;
$this->_link_list .= "[{$this->_link_count}] {$link}n";
return "{$display} [{$this->_link_count}]";
}
### **Explanation of Fix**
1. **Replaced `preg_replace()` with `preg_replace_callback()`**
– This ensures that `$this->_build_link_list()` is executed dynamically when matching “ tags.
2. **Ensured `_build_link_list()` properly stores and returns link references.**
– Adds numbered references for each link at the end of the text.
### **Example Output**
#### **HTML Input**
<p>Check out <a href="https://example.com">this link</a>!</p>
#### **Plain Text Output**
Check out this link [1]!
Links:
[1] https://example.com
This should now work correctly in your script! π
### Fixing `_build_link_list` Issue:
You need to define the `_build_link_list` method inside the `html2text` class. Add this function:
private function _build_link_list($url, $display): string
{
$this->_link_count++;
$link = "[{$this->_link_count}] $display ($url)";
// Append to link list
$this->_link_list .= "n$link";
return " [$this->_link_count]"; // Reference number for inline links
}
### Explanation:
– This function increments `_link_count` each time it processes a link.
– It appends the URL and link text to `_link_list`.
– It replaces the anchor tag with a numbered reference `[1]`, `[2]`, etc.
### Possible Additional Fix:
If `_build_link_list` is not working inside `preg_replace`, ensure you use `preg_replace_callback` instead:
Modify your search-replace pair for “:
[
'/<a [^>]*href="([^"]+)"[^>]*>(.*?)</a>/i' => function ($matches) {
return $this->_build_link_list($matches[1], $matches[2]);
}
]
If your class is not static, ensure you call `$this->_build_link_list` correctly from within the class.
Try these fixes and let me know if you need further help! π
private function _build_link_list($link, $display): string
{
if (substr($link, 0, 7) == 'http://' || substr($link, 0, 8) == 'https://' ||
substr($link, 0, 7) == 'mailto:') {
$this->_link_count++;
$this->_link_list .= "[" . $this->_link_count . "] $linkn";
$additional = ' [' . $this->_link_count . ']';
} elseif (substr($link, 0, 11) == 'javascript:') {
$additional = '';
} else {
$this->_link_count++;
$this->_link_list .= "[" . $this->_link_count . "] " . $this->url;
if (substr($link, 0, 1) != '/') {
$this->_link_list .= '/';
}
$this->_link_list .= "$linkn";
$additional = ' [' . $this->_link_count . ']';
}
return $display . $additional;
}
### **How It Works:**
1. **Checks the link type**
– If the `$link` starts with `http://`, `https://`, or `mailto:`, it increments `$_link_count`, stores the link in `$_link_list`, and appends a reference number.
– If the `$link` starts with `javascript:`, it does nothing special.
– Otherwise, it assumes it's a relative link, appends it to `$_link_list`, and assigns a reference number.
2. **Returns formatted output**
– It returns the `$display` string with an additional reference number (if applicable).
### **Potential Issues or Considerations:**
1. **Undefined Properties?**
– If `$this->_link_list` and `$this->_link_count` are not initialized, this may throw an error.
2. **Possible Undefined `$this->url`**
– It references `$this->url` but doesn't check if it exists.
3. **Trailing Slash Issue for Relative Links**
– If `$link` is an absolute path (`/example`), it will be appended incorrectly.
Would you like me to help improve or debug this function? π
### **Fixes & Improvements:**
β
**Ensures `_link_list` and `_link_count` are initialized**
β
**Checks if `$this->url` is set before using it**
β
**Fixes trailing slash issue for relative links**
β
**Uses `str_starts_with()` (PHP 8+) for better readability**
### **Fixed Code:**
private function _build_link_list($link, $display): string
{
// Ensure _link_count and _link_list are initialized
if (!isset($this->_link_count)) {
$this->_link_count = 0;
}
if (!isset($this->_link_list)) {
$this->_link_list = '';
}
// Initialize additional text
$additional = '';
// Check for absolute links (http, https, mailto)
if (str_starts_with($link, 'http://') || str_starts_with($link, 'https://') || str_starts_with($link, 'mailto:')) {
$this->_link_count++;
$this->_link_list .= "[" . $this->_link_count . "] $linkn";
$additional = " [{$this->_link_count}]";
// Check for JavaScript links
} elseif (str_starts_with($link, 'javascript:')) {
$additional = '';
// Handle relative links
} else {
$this->_link_count++;
// Ensure $this->url is set before using it
$baseUrl = isset($this->url) ? rtrim($this->url, '/') : '';
// Construct the full URL
$fullLink = $baseUrl . '/' . ltrim($link, '/');
$this->_link_list .= "[" . $this->_link_count . "] $fullLinkn";
$additional = " [{$this->_link_count}]";
}
return $display . $additional;
}
### **Whatβs Fixed?**
1. **Ensures `_link_count` and `_link_list` are always set**
2. **Uses `str_starts_with()` for cleaner string comparisons**
3. **Fixes the potential issue with `$this->url` being `null`**
4. **Prevents duplicate slashes when building relative URLs**
Would you like any additional modifications? π