Issue handling URL from XML · matthewmueller/x-ray#21

(8 comments) (0 reactions) (0 assignees)JavaScript (415 forks)batch import

featurehelp wanted

Repository metrics

Stars: (5,804 stars)
PR merge metrics: (No merged PRs in 30d)

Description

I've been digging on this for a while, but I'm pretty stumped right now.

<item>
    <title>Greece's Varoufakis says QE to fuel unsustainable equity rally</title>
    <link>http://feeds.reuters.com/~r/reuters/businessNews/~3/EaR-N8x3VzU/story01.htm</link>
    <description>CERNOBBIO, Italy (Reuters) - The European Central Bank's bond purchases will create an unsustainable stock market rally and are unlikely to boost euro zone investments...</description>
</item>

With .select(['item title']) I'm able to get all of the titles. With .select(['item description']) I'm able to get all of the descriptions. But with .select(['item link']) I only get an array of empty strings back. The number of empty strings equals the number of items in the page.

I'm going to keep digging in, but I think I already have keyboard marks on my forehead. =|

I've tried this using a $root with a link: 'link' attribute already, but same difference.

This is the specific URL I'm scraping: http://feeds.reuters.com/reuters/businessNews?format=xml

Contributor guide

Research direction: Inspect the x ray library's selector logic for handling XML link elements. Trace how the select method processes the XML node tree and why 'item link' returns empty strings. Check if there's a namespace issue or incorrect parsing of child elements.
Tech stack: javascriptnodejs
Domain: backendtooling
Issue type: Bug
Difficulty: 2
Estimated time: 1-3 hours
Activity status: Active
Clarity: Clear
Prerequisites: GitJavaScriptNode.js
Newbie friendliness: 70

Repository metrics

Description

Contributor guide

Get fresh easy issues in your inbox.