Non-breaking spaces have different size #2269

ondras · 2024-10-06T12:14:53Z

In this rendering:

First two words (o kliknutí) have a NBSP between them (czech typography rules). However, IMHO this only forbirds line breaks; there is no reason for this space to have a different size than other inter-word spaces on the same line.

The text was updated successfully, but these errors were encountered:

liZe · 2024-10-09T08:29:59Z

You’re right. The current justification code adds extra space to space characters only. There are many places in the code when we assume that spaces are only "normal" spaces.

In this case, the problem is in:

WeasyPrint/weasyprint/layout/inline.py

Lines 1119 to 1125 in 1aae145

    
           def justify_line(context, line, extra_width): 
        
               # TODO: We should use a better algorithm here, see 
        
               # https://www.w3.org/TR/css-text-3/#justify-algos 
        
               nb_spaces = count_spaces(line) 
        
               if nb_spaces == 0: 
        
                   return 
        
               add_word_spacing(context, line, extra_width / nb_spaces, 0)

We count spaces and use it to set justification_spacing. We should change our space detection here:

WeasyPrint/weasyprint/layout/inline.py

Line 1131 in 1aae145

return box.text.count(' ')

And fix our spacing adjustment here:

WeasyPrint/weasyprint/text/line_break.py

Lines 182 to 196 in 1aae145

    
           if word_spacing: 
        
               if bytestring == b' ': 
        
                   # We need more than one space to set word spacing 
        
                   self.text = ' \u200b'  # Space + zero-width space 
        
                   text, bytestring = unicode_to_char_p(self.text) 
        
                   pango.pango_layout_set_text(self.layout, text, -1) 
        
               space_spacing = int(word_spacing * TO_UNITS + letter_spacing) 
        
               position = bytestring.find(b' ') 
        
               # Pango gives only half of word-spacing on boundaries 
        
               boundary_positions = (0, len(bytestring) - 1) 
        
               while position != -1: 
        
                   factor = 1 + (position in boundary_positions) 
        
                   add_attr(position, position + 1, factor * space_spacing) 
        
                   position = bytestring.find(b' ', position + 1)

For now we don’t have to support all justification opportunities as we don’t support text-justify, but we can at least support word separators. I’m not sure that a real list is actually defined in Unicode, as there are exceptions such as punctuation and fixed-width spaces. We can at least start with the list given by the specification.

liZe added the bug Existing features not working as expected label Oct 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-breaking spaces have different size #2269

Non-breaking spaces have different size #2269

ondras commented Oct 6, 2024

liZe commented Oct 9, 2024

Non-breaking spaces have different size #2269

Non-breaking spaces have different size #2269

Comments

ondras commented Oct 6, 2024

liZe commented Oct 9, 2024