Skip to content

CDATA element fails to parse when element contains £ symbol #122

@tomtaylor

Description

@tomtaylor

We have an XML file which is failing to parse since we switched from calling File.stream!(path, [:compressed, :trim_bom]) to File.stream!(path, [:compressed, :trim_bom], 32_768). It throws the following error:

{:error, %Saxy.ParseError{reason: {:token, :"]]"}, binary: <<10, 60, 100, 101, 115, 99, 114, 105, 112, 116, 105, 111, 110, 62, 60, 33, 91, 67, 68, 65, 84, 65, 91, 60, 112, 62, 60, 115, 116, 114, 111, 110, 103, 62, 83, 65, 80, 32, 124, 32, 46, 78, 69, 84, 32, 124, ...>>, position: 92}}

The file is littered with empty CDATA elements. I wonder if one of those is aligning with the start/end of a buffer? I can provide the full XML file if useful - it's 114MB and I'd prefer not to provide it publicly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions