Get HTML code and text inside tags

fanboy000fanboy000 GBMember
edited December 2012 in Xamarin.Android

I would like to download the HTML code from a URL and then get the text inside specified tags
(ex: <div class="abc">). I' ve tried using HtmlAgilityPack but it just won't work. Please help.

Posts

  • CheesebaronCheesebaron DKInsider, University mod

    I've tried using HtmlAgilityPack but it just won't work.

    What won't work?

  • fanboy000fanboy000 GBMember
    edited December 2012

    When I use HtmlDocument doc = new HtmlDocument(); i get the following error: Error CS0012: The type 'System.Xml.XPath.IXPathNavigable' is defined in an assembly that is not referenced. You must add a reference to assembly 'System.Xml, Version=4.0.0.0, Culture=neutral. It might be that I didn't properly install HAP. I downloaded the HAP 1.4.6 from the official site and then in my project added the reference to HtmlAgilityPack.dll. I did the same in a standard C# project and it worked, but in Mono for Android I get this werid error.

  • JonathanPryorJonathanPryor USXamarin Team Xamurai

    @fanboy000: You need to rebuild the HtmlAgilityPack source against the Mono for Android profile assemblies, which will cause HtmlAgilityPack.dll to reference System.Xml, Version=2.0.5.0, which is required for use on Mono for Android.

  • fanboy000fanboy000 GBMember

    I'm very new at this, so can you please explain how I do that?

  • fanboy000fanboy000 GBMember

    Sorry to ask so much, but at step 7 I get Build Failed (6 Errors). And where am I supposed to use the code in step 8.

  • PatRoamPatRoam USBeta
    edited December 2012

    Or just quick an dirty depending on how much text you need to parse out.. I think it uses system.web

    HttpWebRequest WebReq = (HttpWebRequest)WebRequest.Create("http://yourwebsite/default.html");
    WebReq.Timeout=8000;  //if it's an aspx give it a few to warm up
    
    try
    {
        using (HttpWebResponse WebResp = (HttpWebResponse)WebReq.GetResponse())
        {
            Stream mystream= WebResp.GetResponseStream();
            StreamReader mystreamreader= new StreamReader(mystream);
            String myresponse = mystreamreader.ReadToEnd();
    

    myresponse is your webpage - use string functions to parse out what you need.

  • fanboy000fanboy000 GBMember

    @jonp - Thank you for helping. It works now.

  • JonathanPryorJonathanPryor USXamarin Team Xamurai

    @PatRoam: Once you've downloaded the page, how would you suggest parsing it? Regular expressions, perhaps? ;-)

  • PatRoamPatRoam USBeta

    Ahh funny guy : D

    Seriously though appreciate all of the time and effort you put in here - it helps.

  • DmitriyChekaykinDmitriyChekaykin USMember
    edited June 2013

    @johnp, wow! This is great! ty

  • KillingMoonKillingMoon USMember

    Hi @jonp, could you provide an official compatible dll of HtmlAgilityPack for Xamarin ?

  • Hello. I've attached the DLL of HtmlAgilityPack i've made for a iOS project.

  • sorry, here it is :D

  • SKallSKall USMember ✭✭✭✭

    Regular expressions are a great tool to find complicated sequences. Below is a quick sample on how to find the class names. The expression commented out works for a simple string but not when there are tabs and line breaks involved. The second expression should work with a more complex string but it has not been tested.

        class Program
    {
        private static readonly string[] TestStrings = new[] { "div class=\"abc_123\"" };
    
        static void Main(string[] args)
        {
            //var classExp = new Regex("div class=\"(?<class>[\\w]+)\"");
            var classExp = new Regex(
                string.Format
                (
                    "div{3}{1}{3}class{0}={0}\"(?<class>{2})\"",
                    @"[\s\t\r\n]*",
                    "[\\s]+",
                    "[\\w]+",
                    @"[\t\r\n]*"
                    )
                );
    
            var matches = TestStrings.Select(a =>
                    new
                    {
                        Text = a,
                        Matches = classExp.Matches(a).Cast<Match>().Select(b => b.Groups["class"].Value)
                    });
    
            foreach (var match in matches)
            {
                Console.WriteLine(match.Text);
                foreach (var className in match.Matches)
                {
                    Console.WriteLine(className);
                }
    
            }
        }
    }
    
  • blueybluey USMember

    It would be so easy and simple if member of the Xamarin team would just compile and upload a htmlagilitypack.dll for android and ios some where in components section. But no it is so hard let the other idiots struggle for them self.

  • WarickusWarickus UAMember

    Who can create and post HtmlAgilityPack for Android? I have read this topic and understood nothing:(

  • CheesebaronCheesebaron DKInsider, University mod
  • Thanks!

  • Ross_BRoss_B USMember ✭✭

    @JonathanPryor‌ Great tutorial--thanks!

  • TsovakTsovak USMember

    You have to compile it from code
    Download the source

    go into

    \htmlagilitypack-99964\Branches\1.4.0\HtmlAgilityPack
    Edit the csproj change to


    Save and load

    Fix errors

    Trace -> Debug

    Remove block

    if (!SecurityManager.IsGranted(new DnsPermission(PermissionState.Unrestricted)))
    {
    //do something.... not at full trust
    try
    {
    RegistryKey reg = Registry.ClassesRoot;
    reg = reg.OpenSubKey(extension, false);
    if (reg != null) contentType = (string)reg.GetValue("", def);
    }
    catch (Exception)
    {
    contentType = def;
    }
    }
    Remove block

    if (SecurityManager.IsGranted(new RegistryPermission(PermissionState.Unrestricted)))
    {
    try
    {
    RegistryKey reg = Registry.ClassesRoot;
    reg = reg.OpenSubKey(@MIME\Database\Content Type\ + contentType, false);
    if (reg != null) ext = (string)reg.GetValue("Extension", def);
    }
    catch (Exception)
    {
    ext = def;
    }
    }
    Use the dll in the bin/debug folder

  • TsovakTsovak USMember

    https://github.com/Tsovak/AndroidHtmlAgilityPack
    Fix more problem for Android and building dll for use

Sign In or Register to comment.