Parsing HTML From A String In Ruby On Rails

Andrew Perlis
4 min readApr 18, 2021

During my phase 5 project at Flatiron Bootcamp I encountered a very different problem than what I have seen before. In this project I was using an open API (did not need an auth key) which returned some peculiar data. See when the data was returned I realized that some sections were returning strings that contained HTML content when all that I wanted was to have the data persist as a string and nothing more. This was a problem because my project relied on the information from the API in order to work as seed data for my project giving me the option to render what I want when I want. Here is where ActionView::Base comes into play.

Example of HTML Code being returned from the API inside of the string.

Now the question remains so how is this going to help me with my problem. First let’s talk about what ActionView Helpers do. ActionView Helpers as defined by the Ruby docs says “Action View templates are written using embedded Ruby in tags mingled with HTML. To avoid cluttering the templates with boilerplate code, a number of helper classes provide common behavior for forms, dates, and strings. It’s also easy to add new helpers to your application as it evolves.”

Now that we know what an ActionView Helper is let’s look at the helper that we are going to use call full_sanitizer. According to the Ruby docs the SanitizeHelper module provides a set of methods for scrubbing text of undesired HTML elements. OK, now it seems that we are getting somewhere but how does this code work? When I was doing my project I defined a local Method in my Seeds..rb file (this is also where I make my API GET request). My Method when initially written looked like this.

Initial Code for parsing out HTML Code from string

Now running this method I learned that while I was indeed stripping out the HTML code that was embedded in the string I noticed something weird. The code was returning a section of code that has a bunch of “/n” tags or it was returning sections with several extra space values in the string. Pretty much what I was returning with this code was a string that looked something like this.

HTML Parsed out but returning /n and extra spaces values when checked using Rails Console.

So now it looks like we have another problem but we have another powerful method already from the ActionView::Base. It’s the “.squish” method. This method takes all white space and placeholder values like “/n” and removes them to return a neat and new string that is properly placed and spaced. At this point we add the .squish functionality to our existing string and we can see that the place holder values and white space are now gone when we retest!

Method written to remove the white space and new line character “/n”
String being returned with no HTML.

Now for some you may be done at this point but I was having an issue still with my API where my method would stop my database from seeding if the initial value of the string from the API was null. I was able to fix this by writing a very simple if statement in my method. My if statement says that if the argument being passed in has a value then to run ActionView::Base methods I have written. Otherwise I have it just returning as an empty string to satisfy my seed file and to pass in information as JSON to my server. My final code and calling it in my initial fetch request to the API looked like this.

Final Code to run with the GET Request.
Get Request using RestClient Gem in Seed.rb file

With all of this information we are now able to successfully grab the info from the API and remove the HTML elements it was returning. I have other helper methods I used in my seed file to help with another issue I was having but thats another blog topic for another day. Hopefully this information can help you out if you are having a similar issue!

--

--