simplehtmldom Failing

Hello,

I do not understand why the simplehtmldom is failing.
This is the code:

<?php

ini_set('display_errors',1);
ini_set('display_startup_errors',1);
error_reporting(E_ALL);

  //---
  include_once('simplehtmldom_1_9_1/simple_html_dom.php');
  //---
  $url = "https://victoriousseo.com/sitemap_index.xml";
  $html = new simple_html_dom();
  $html->load_file($url);
  //--
  foreach($html->find("a") as $link)
  {
    echo $link->href."< br />";
  }
?>

I got the simplehtmldom.php on the same directory as the file that has the above crawler code.
I get no error.

And, checkout the sitemap if you must as it does exist:
https://victoriousseo.com/sitemap_index.xml

And so, I do not understand why the code extracts nothing. I see a complete blank page.
Code from a tuturial:
http://timvanosch.blogspot.com/2013/02/php-tutorial-making-webcrawler.html

Php Xml Sitemap Crawler Tutorial Sought

Hello,

I am building a searchengine with php. Nearly finished.
Now, I need to build the web crawler.
Since most websites have xml sitemap for web crawlers to use to find their site links, I prefer to build an xml site map crawler than a general http crawler.
I am not having much luck finding a php tutorial on it. Using these keywords on Google:

"sitemap crawler"+"php tutorial" OR "php", -"sitemap generator"

If you know of any tutorial or free code then let me know.
The simpler the code, the better.
I will keep you updated on this thread to what I have found so far. If you find any better than mine then drop me a line here.

Thanks

How To Do Fuzzy Match Sql Query ?

Hi,

Are these MySql queries (with Php) correct ?
Wild Card match. Fuzzy Match query.
Imagine there are 7 columns in the Mysql table.

$sql = "SELECT * from $tbl WHERE $col_1 LIKE ? OR $col_2 LIKE ? OR $col_3 LIKE ? OR $col_4 LIKE ? OR $col_5 LIKE ? OR $col_6 LIKE ? OR $col_7 LIKE ? LIMIT $limit OFFSET $offset";

Directory or Product Listing Categories

Hi,

Can you remember once upon a time Yahoo and Open Directory or Dmoz were directories with categories and subcategories and sub-sub categories etc. ?
I need to have a list of such categories. Link listing categories.
Trying to build a link directory.
Where can I get the lists of popular categories that websites would find handy enough to list their links ? Do not want to be creating cats & subcats etc. for those that hardly get any listings.
Googling for the lists are yielding no results tonight.

Why Inc & Decrement Fails ?

Hi,

$page = 10;
echo 'page ' .$page; //echoes: 10

echo 'backward ' .$backward = $page--; //echoes: 10
echo "<br>"; echo "<br>";
echo 'forward ' .$forward = $page++; //echoes: 9
echo "<br>"; echo "<br>";
Strange!

Why is the decrement failing ?
Since $page = 10, then decrementing that ($backward) should = 9. But echoes 10!

Again, since $page = 10, then incrementing that ($forward) should = 11. But echoes 9!

Which Mysqli Error Report To Use ?

Hi,

I am not sure which mysqli error reporting function to use here.
Which of the two is correct ?

if(!mysqli_stmt_prepare($stmt,$sql_count))
    {
        echo __LINE__; echo '<br>';//DELETE

        echo 'Mysqli Error: ' .mysqli_stmt_error(); //DEV MODE.
        echo '<br>';
        echo 'Mysqli Error No: ' .mysqli_stmt_errno(); //DEV MODE.
        echo '<br>';
        die('Registration a Failure!');
}
if(!mysqli_stmt_prepare($stmt,$sql_count))
    {
        echo __LINE__; echo '<br>';//DELETE

        echo 'Mysqli Error: ' .mysqli_error(); //DEV MODE.
        echo '<br>';
        echo 'Mysqli Error No: ' .mysqli_errno(); //DEV MODE.
        echo '<br>';
        die('Registration a Failure!');
}

raw_urlencode() Questions

Hello,

You use raw_urlencode() on the file path. Does that mean you have to exclude the domain name part ?

<?php
echo '<a href="http://example.com/'.
    rawurlencode('Sales and Marketing').
    '/search?'.
    'query='.urlencode('Monthly Report').
    '">Click Me</a>';
?>

Imagine the above is my link listed on one of my pages.
Now, why is it necessary for me to raw_urlencode() my own site's file path when I put the above link on my pages ? How could xss attack be done here ?
Or is raw_urlencode() really not necessary here unless I get echoing user submitted links ?
Eg. My page getting the url from my mysql db via $_GET[].

Why Has This Become 1 INT ? And Why Has That Become 0 INT ?

Hello,

$c = 'bool'; // boolean
settype($c, "integer"); // $c is now integer (1)
echo $c;

Why has $c become the int of 1 and not any other number ?
And why is the following set to an int of 0 and not any other number ?

$c = 'string'; // boolean
settype($c, "integer"); // $c is now integer (0)
echo $c;

Is htmlspecialchars() Necessary Here ?

Hello,

One of the following code got htmlspecialchars. Which code is correct out of the two ?
Both codes build pagination section. Need to add security so users cannot sql inject.
Not using http_build_query function here as I want to build a pagination section without it and already built one with http_build_query function. Just learning different ways to build pagination section. Old way. New way. Ok ?

Page Format 1: https://localhost/Work/buzz/Templates/Pagination_TEMPLATE.php?tbl=links&bool=null&col_1=domain&input_1=brute.com&lmt=1&pg=1

Page Format 2: https://localhost/Work/buzz/Templates/Pagination_TEMPLATE.php?tbl=links&bool=null&col_1=domain&col_2=email_domain&input_1=brute.com&input_2=brute.com&lmt=1&pg=1

$i = 0;
while($i<$total_pages)
{
    $i++;
    if($bool=='and' || $bool=='or')
    {
        $serps_url = $_SERVER['PHP_SELF'].'?'.'tbl='.urlencode($tbl).'&'.'col_1='.urlencode($col_1).'&'.'col_2='.urlencode($col_2).'&'.'bool='.$bool.'&'.'input_1='.urlencode($input_1).'&'.'input_2='.urlencode($input_2).'&'.'lmt='.intval($limit).'&'.'pg='.intval($i);
    }
    else
    {
        $serps_url = $_SERVER['PHP_SELF'].'?'.'tbl='.urlencode($tbl).'&'.'col_1='.urlencode($col_1).'&'.'bool='.urlencode($bool).'&'.'input_1='.urlencode($input_1).'&'.'lmt='.intval($limit).'&'.'pg='.intval($i);
    }
    if($i==$page)
    {
        echo "<a href=\"$serps_url\"><b>$i</b></a>";
    }
    else
    {
        echo "<a href=\"$serps_url\">$i</a>";
    }
}

Thank you.

 $i = 0;
    while($i<$total_pages)
    {
        $i++;
        if($bool=='and' || $bool=='or')
        {
            $serps_url = $_SERVER['PHP_SELF'].'?'.'tbl='.urlencode($tbl).'&'.'col_1='.urlencode($col_1).'&'.'col_2='.urlencode($col_2).'&'.'bool='.$bool.'&'.'input_1='.urlencode($input_1).'&'.'input_2='.urlencode($input_2).'&'.'lmt='.intval($limit).'&'.'pg='.intval($i);
        }
        else
        {
            $serps_url = $_SERVER['PHP_SELF'].'?'.'tbl='.urlencode($tbl).'&'.'col_1='.urlencode($col_1).'&'.'bool='.urlencode($bool).'&'.'input_1='.urlencode($input_1).'&'.'lmt='.intval($limit).'&'.'pg='.intval($i);
        }
        if($i==$page)
        {
            echo '<a href="' .htmlspecialchars($serps_url) .'">' ."<b>$i</b>" .'</a>';
        }
        else
        {
            echo '<a href="' .htmlspecialchars($serps_url) .'">' ."$i" .'</a>';
        }
    }

Which Pagination Section Code Is Correct Using http_build_query() ?

Hello,

I know the safest way to write a pagination section with php is to use the http_build_query().
Like so:

$i = 0;
while($i<$total_pages)
{
    $i++;
    if($_GET['bool']=='null')
    {
        //Page Format: $_GET List.
        $array = array("tbl"=>"$tbl","col_1"=>"$col_1","bool"=>"$bool","input_1"=>"$input_1","lmt"=>"$limit","pg"=>"$i");
    }
    else
    {
        //Page Format: $_GET List.
        $array = array("tbl"=>"$tbl","col_1"=>"$col_1","col_2"=>"$col_2","bool"=>"$bool","input_1"=>"$input_1","input_2"=>"$input_2","lmt"=>"$limit","pg"=>"$i");
    }

    $serps_url = $_SERVER['PHP_SELF'].'?'.http_build_query($array);

    if($i==$page)
    {
        echo '<a href="' .htmlspecialchars($serps_url) .'">' ."<b>$i</b>" .'</a>';
    }
    else
    {
        echo '<a href="' .htmlspecialchars($serps_url) .'">' ."$i" .'</a>';
    }
}

I believe the above code is buggy because there is no need to use the htmlspecialchars() here.
Am I correct ?
Is the following code ok or not ?

$i = 0;
while($i<$total_pages)
{
    $i++;
    if($_GET['bool']=='null')
    {
        //Page Format: $_GET List.
        $array = array("tbl"=>"$tbl","col_1"=>"$col_1","bool"=>"$bool","input_1"=>"$input_1","lmt"=>"$limit","pg"=>"$i");
    }
    else
    {
        //Page Format: $_GET List.
        $array = array("tbl"=>"$tbl","col_1"=>"$col_1","col_2"=>"$col_2","bool"=>"$bool","input_1"=>"$input_1","input_2"=>"$input_2","lmt"=>"$limit","pg"=>"$i");
    }

    $serps_url = $_SERVER['PHP_SELF'].'?'.http_build_query($array);

    if($i==$page)
    {
        echo '<a href="' .$serps_url .'">' ."<b>$i</b>" .'</a>';
    }
    else
    {
        echo '<a href="' .$serps_url .'">' ."$i" .'</a>';
    }
}

Page Format 1: https://localhost/Work/buzz/Templates/Pagination_TEMPLATE.php?tbl=links&bool=null&col_1=domain&input_1=brute.com&lmt=1&pg=1

Page Format 2: https://localhost/Work/buzz/Templates/Pagination_TEMPLATE.php?tbl=links&bool=null&col_1=domain&col_2=email_domain&input_1=brute.com&input_2=brute.com&lmt=1&pg=1