The last few weeks in the Data School have seen us learning lots about APIs and web scrapping in Alteryx. I thought it might be useful to do a quick series introducing some of the concepts and discuss where and how they can be useful.
First let’s go back to basics and talk about this new-fangled thing called “The Internet”. Stemming from the human desire to spread information, knowledge and learning, the internet has become a fantastic resource where you can learn anything from Tableau and Alteryx to how to fix the kitchen sink. Or look at memes, whichever.
The Internet is made up of web pages, which are in turn made up of code. The fundamental back bone of a website is it’s HTML (Hyper Text Markup Language), and it’s this that we use to extract data from websites. Each item you see on a web page, be it the title, an image, or writing on the body, is wrapped in a set of tags. These tags separate the HTML into sections and are tremendously useful when trying to parse out the code once you’ve downloaded it into Alteryx.
Tags come with a standardised format; the font in an <h1> header tag will be bigger than in an <h2> tag for example; but the formatting and styling can be altered with CSS, or Cascading Style Sheets. Tags can be given classes which refer to sets of styling in a CSS sheet, these classes and can help to distinguish tags from one another when parsing out code.
Stay tuned for Part II: What’s an API when it’s at home?