vividleft.blogg.se - Web scraper pagination

Web scraper pagination how to#

|> List.map (fun post -> linkFromPost post |> nameFromBookpage) |> fun main -> main.CssSelect(".book-item") |> fun page -> page.CssSelect(".list-books") Now that we have all these helpful functions, let's write another function! This time for getting the list of names from the page: let listNamesFromPage (page:HtmlDocument) : string list = |> fun html -> html.AttributeValue "href" Now we are going to make a function that will extract the link to the book: let linkFromPost (post:HtmlNode) :string = The name is wrapped into a clickable element, so we are looking to find the a tag and extract its text. We have a link as a parameter, this link will directly go to the book's page because that's where the author's name is. |> fun element -> element.DirectInnerText().Trim() |> fun post -> post.Descendants "a" |> Seq.head |> fun page -> page.CssSelect(".book-author") Let's create a function that will get us the name from the book page: let nameFromBookpage (link:string) :string = Pagination also includes the button "Next" so we are going to filter that out and get the last number. Using the cssSelect method we are getting pagination with class name 'foo' which includes the numbers for each page. |> fun elem -> elem.DirectInnerText().Trim() |> Seq.filter (fun elem -> elem.HasClass(".next") |> not ) We will start by adding the FSharp.Data nuget which will help us with web scraping #r "nuget: FSharp.Data, 5.0.2"īecause we will iterate over the all of the pages lets make sure we get the last page number: let lastPageNumber = Note: This will be done in a fsx file that can be run with dotnet fsi or or VSC if you have the ionide extension If you hover over the console results, your queried element should be highlighted. Looking at the page we can identify two important elements:īefore we go into the code, we can quickly make sure we get the right elements by using the dev console and querySelector. Also, on each page we have a list of books that can be clicked and it will redirect to a page describing more information about the book, including the author's name. Let's pretend our fictional book store site uses pagination, so a list of numbers dedicated to each page usually found at the bottom and when an index is clicked it will redirect to the page with the number index.

Web scraper pagination how to#

In this blog post, we will learn how to navigate through a fictional bookstore website, extract the author from every book displayed, and then we can count how many times that author appeared and who wrote the most books. Web scraping can be useful for gathering and processing data from the internet. Today we will play around with web scraping in F#.