Skip to content

[FEATURE] tool for translating unicode offsets to utf-16 offsets of MessageEntity #4319

@Antares0982

Description

@Antares0982

What kind of feature are you missing? Where do you notice a shortcoming of PTB?

I want to send a message with customized MessageEntity. At first I assumed ptb already handled the unicode -> utf-16 translation. It generally works for most texts, but turns out it will fail if there are some emojis in the message.

Describe the solution you'd like

A new class UnicodeMessageEntity is needed. Currently I have a simple solution:

    def fix_entities_offset(self):
        for text, entities in zip(self.texts, self.entities):
            cur_index = 0
            accumulated_len = 0
            for i, entity in enumerate(entities):
                cur_text = text[cur_index:entity.offset]
                accumulated_len += len(cur_text.encode('utf-16-le'))
                cur_off = accumulated_len // 2
                cur_text = text[entity.offset:entity.offset+entity.length]
                accumulated_len += len(cur_text.encode('utf-16-le'))
                cur_len = accumulated_len // 2 - cur_off
                entities[i] = MessageEntity(offset=cur_off, length=cur_len, type=entity.type, language=entity.language)
                cur_index = entity.offset + entity.length

It would be nice if this can be automatically applied when send messages, when entity object is an instance of UnicodeMessageEntity.

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions