HTML to Text Conversion Help

User
Assistant
It looks like you're working with an older PHP class that converts HTML to plain text. What exactly do you need help with? Are you trying to modify or improve the class, debug an issue, or integrate it into a project?
User
$this->_build_link_list in above code not working
Assistant
The issue is that `$this->_build_link_list("\1", "\2")` is used inside `preg_replace`, but it is treated as a string rather than a function call. PHP does not execute the function in this context.

### Solution:
You need to use `preg_replace_callback()` to properly handle the link list function.

#### **Fix the Link Replacement in `preg_replace_callback_array`**
Modify the `_convert()` method like this:

private function _convert(): void
{
    $this->_link_count = 0;
    $this->_link_list = '';

    // Ensure $this->html is a string (fallback to an empty string if null)
    $text = trim(stripslashes($this->html ?? ''));
    $text = strip_tags($text, $this->allowed_tags);
    $text = html_entity_decode($text);

    // Use preg_replace_callback for link handling
    $text = preg_replace_callback(
        '/<a [^>]*href="([^"]+)"[^>]*>(.*?)</a>/i',
        function ($matches) {
            return $this->_build_link_list($matches[1], $matches[2]);
        },
        $text
    );

    // Use preg_replace for other replacements
    $text = preg_replace($this->search, $this->replace, $text);

    // Normalize newlines
    $text = preg_replace("/ns+n/", "nn", $text);
    $text = preg_replace("/[n]{3,}/", "nn", $text);

    // Append links if there are any
    if (!empty($this->_link_list)) {
        $text .= "nnLinks:n" . $this->_link_list;
    }

    $this->text = $text;
    $this->_converted = true;
}

#### **Fix the `_build_link_list()` method**
If `_build_link_list` is missing, add this method inside the `html2text` class:

private function _build_link_list($link, $display): string
{
    $this->_link_count++;
    $this->_link_list .= "[{$this->_link_count}] {$link}n";
    return "{$display} [{$this->_link_count}]";
}

### **Explanation of Fix**
1. **Replaced `preg_replace()` with `preg_replace_callback()`**
– This ensures that `$this->_build_link_list()` is executed dynamically when matching “ tags.

2. **Ensured `_build_link_list()` properly stores and returns link references.**
– Adds numbered references for each link at the end of the text.

### **Example Output**
#### **HTML Input**

<p>Check out <a href="https://example.com">this link</a>!</p>

#### **Plain Text Output**

Check out this link [1]!

Links:
[1] https://example.com

This should now work correctly in your script! πŸš€

User
Assistant
It looks like your `_build_link_list` method is missing from the class, but it is referenced in your `$this->replace` array for handling “ elements.

### Fixing `_build_link_list` Issue:
You need to define the `_build_link_list` method inside the `html2text` class. Add this function:

private function _build_link_list($url, $display): string
{
    $this->_link_count++;
    $link = "[{$this->_link_count}] $display ($url)";
    
    // Append to link list
    $this->_link_list .= "n$link";

    return " [$this->_link_count]"; // Reference number for inline links
}

### Explanation:
– This function increments `_link_count` each time it processes a link.
– It appends the URL and link text to `_link_list`.
– It replaces the anchor tag with a numbered reference `[1]`, `[2]`, etc.

### Possible Additional Fix:
If `_build_link_list` is not working inside `preg_replace`, ensure you use `preg_replace_callback` instead:

Modify your search-replace pair for “:

[
    '/<a [^>]*href="([^"]+)"[^>]*>(.*?)</a>/i' => function ($matches) {
        return $this->_build_link_list($matches[1], $matches[2]);
    }
]

If your class is not static, ensure you call `$this->_build_link_list` correctly from within the class.

Try these fixes and let me know if you need further help! πŸš€

User
function exist
private function _build_link_list($link, $display): string
{
if (substr($link, 0, 7) == 'http://' || substr($link, 0, 8) == 'https://' ||
substr($link, 0, 7) == 'mailto:') {
$this->_link_count++;
$this->_link_list .= "[" . $this->_link_count . "] $linkn";
$additional = ' [' . $this->_link_count . ']';
} elseif (substr($link, 0, 11) == 'javascript:') {
$additional = '';
} else {
$this->_link_count++;
$this->_link_list .= "[" . $this->_link_count . "] " . $this->url;
if (substr($link, 0, 1) != '/') {
$this->_link_list .= '/';
}
$this->_link_list .= "$linkn";
$additional = ' [' . $this->_link_count . ']';
}

return $display . $additional;

}

Assistant
Yes, the function `_build_link_list` exists in your code. It is a **private function** that builds a list of links by appending them to `$_link_list`. It also keeps track of the number of links using `$_link_count`.

### **How It Works:**
1. **Checks the link type**
– If the `$link` starts with `http://`, `https://`, or `mailto:`, it increments `$_link_count`, stores the link in `$_link_list`, and appends a reference number.
– If the `$link` starts with `javascript:`, it does nothing special.
– Otherwise, it assumes it's a relative link, appends it to `$_link_list`, and assigns a reference number.

2. **Returns formatted output**
– It returns the `$display` string with an additional reference number (if applicable).

### **Potential Issues or Considerations:**
1. **Undefined Properties?**
– If `$this->_link_list` and `$this->_link_count` are not initialized, this may throw an error.
2. **Possible Undefined `$this->url`**
– It references `$this->url` but doesn't check if it exists.
3. **Trailing Slash Issue for Relative Links**
– If `$link` is an absolute path (`/example`), it will be appended incorrectly.

Would you like me to help improve or debug this function? πŸš€

User
fix this
Assistant
Here's the improved version of your `_build_link_list` function with proper initialization, handling of `$this->url`, and fixes for potential issues:

### **Fixes & Improvements:**
βœ… **Ensures `_link_list` and `_link_count` are initialized**
βœ… **Checks if `$this->url` is set before using it**
βœ… **Fixes trailing slash issue for relative links**
βœ… **Uses `str_starts_with()` (PHP 8+) for better readability**

### **Fixed Code:**

private function _build_link_list($link, $display): string
{
    // Ensure _link_count and _link_list are initialized
    if (!isset($this->_link_count)) {
        $this->_link_count = 0;
    }
    if (!isset($this->_link_list)) {
        $this->_link_list = '';
    }

    // Initialize additional text
    $additional = '';

    // Check for absolute links (http, https, mailto)
    if (str_starts_with($link, 'http://') || str_starts_with($link, 'https://') || str_starts_with($link, 'mailto:')) {
        $this->_link_count++;
        $this->_link_list .= "[" . $this->_link_count . "] $linkn";
        $additional = " [{$this->_link_count}]";

    // Check for JavaScript links
    } elseif (str_starts_with($link, 'javascript:')) {
        $additional = '';

    // Handle relative links
    } else {
        $this->_link_count++;

        // Ensure $this->url is set before using it
        $baseUrl = isset($this->url) ? rtrim($this->url, '/') : '';

        // Construct the full URL
        $fullLink = $baseUrl . '/' . ltrim($link, '/');

        $this->_link_list .= "[" . $this->_link_count . "] $fullLinkn";
        $additional = " [{$this->_link_count}]";
    }

    return $display . $additional;
}

### **What’s Fixed?**
1. **Ensures `_link_count` and `_link_list` are always set**
2. **Uses `str_starts_with()` for cleaner string comparisons**
3. **Fixes the potential issue with `$this->url` being `null`**
4. **Prevents duplicate slashes when building relative URLs**

Would you like any additional modifications? πŸš€

Leave a Comment

Your email address will not be published. Required fields are marked *