Treehouse: Grow your CSS skills. Land your dream job.

Last updated on:

Convert Accented Characters

For instance, if you want to use a string as part of a URL but need to make it safe for that kind of use.

function replace_accents($str) {
   $str = htmlentities($str, ENT_COMPAT, "UTF-8");
   $str = preg_replace('/&([a-zA-Z])(uml|acute|grave|circ|tilde);/','$1',$str);
   return html_entity_decode($str);
}

Comments

  1. Jan-Marten de Boer
    Permalink to comment#

    Or you just use the functions that are already implemented in PHP ><

    
    $unsafe = 'Hello Daniël';
    $safe = urlencode($unsafe);
    $transfered = urldecode($_GET['data']);
    
    • Jan-Marten de Boer
      Permalink to comment#

      the $unsafe value pulled through urlencode gives you this: “Hello%20Dani%C3%ABl”
      That are all valid characters inside a URL. It even works on so called multi byte characters, which are used in Korean, Japanese, Mandarin and similar languages. I found that out the hard way when some survey system started chopping up comments.

      The JavaScript equivalent is encodeURI / encodeURIComponent and to reverse that you use decodeURI / decodeURIComponent.

      
      var response, safe, unsafe = 'Hello Daniël';
      safe = encodeURIComponent(unsafe);
      response = decodeURIComponent(ajaxResponse);
      
    • parmod
      Permalink to comment#

      function replace_bishnoi($str) {
      $str = htmlentities($str, ENT_COMPAT, “UTF-8”);
      $str = preg_replace(‘/&([a-zA-Z])(uml|acute|grave|circ|tilde);/’,’$1′,$str);
      return html_entity_decode($str);
      }

  2. Jan-Marten de Boer
    Permalink to comment#

    The go the other way around, when facing encoded characters, you might want to get the character that relates the most to an accented variant. E.g.: You want to store a song with DJ Tiësto in the title, but want it to turn out like DJ Tiesto when creating a filename in your script.

    
    function transform($title){
    	// Support ASCII list characters in encoded format
    	while(strpos($title,'&#')!==false){
    		$pointer = strpos($title,'&#');
    		$plength = 5;
    		$first = substr($title,0,$pointer);
    		$last = substr($title,$pointer+$plength);
    		$pnr = substr($title,$pointer+2,3);
    		$backstring = '';
    		for($i=0;$i<3;$i++){
    			if(!is_numeric($pnr[$i])){
    				$backstring.=$pnr[$i];
    				unset($pnr[$i]);				
    			}
    		}
    		$last = $backstring.$last;
    		$title = $first.htmlentities(chr($pnr)).$last;
    	}
    	
    	$title = str_replace(
    		array("ì","í","î","ï","ï","î","í","ì","ù"
    ,"ú","û","ü","ü","û","ú","ù","ç","ç","ì",
    "í","î","ï","ë","ê","é","è","è",
    "é","ê","ë","ë","à","á","â","ã","ä","å","à","á","â",
    "ã","ä","å",".","!",",",":",";","'","\"","ù","ú","û",
    "ü","ý","þ","ÿ","ù","ú","û","ü","ý",
    "þ","ÿ"),
    		array("i","i","i","i","i","i","i","i","u","u","u","u","u","u","u","u","c","c",
    "i","i","i","i","e","e","e","e","e","e","e","e","e","a","a","a","a","a","a","a",
    "a","a","a","a","a","","","","","","","","u","u","u","u","u","u","u","u","u","u",
    "u","u","u","u"),
    		$title
    	);
    	$title = str_replace(
    		array('\\','/',':','"','*','?','','|','+','%'),
    		'',
    		$title
    	);
    	while(strpos($title,'  ')!==false){
    		$title=str_replace('  ',' ',$title);
    	}
    	return $title;
    }
    
  3. Rodney Rehm
    Permalink to comment#

    the original snippet doesn’t cut it. Unicodes that aren’t covered by htmlentities() are ignored altogether. If you (safely) want to transform an UTF-8 string to alph-anumeric for the use in URLs, give urlify() a shot.

  4. Julien
    Permalink to comment#

    I have done this :

    function replace_accents($str) {
    $str = htmlentities($str);
    $str = preg_replace(‘/&([a-zA-Z])(uml|acute|grave|circ|tilde|cedil|elig|ring|th|slash|zlig|horn);/’,’$1′,$str);
    return html_entity_decode($str);
    }

  5. Julien
    Permalink to comment#


    function replace_accents($str) {
    $str = htmlentities($str);
    $str = preg_replace('/&([a-zA-Z])(uml|acute|grave|circ|tilde|cedil|elig|ring|th|slash|zlig|horn);/','$1',$str);
    return html_entity_decode($str);
    }

Leave a Comment

Posting Code

We highly encourage you to post problematic HTML/CSS/JavaScript over on CodePen and include the link in your post. It's much easier to see, understand, and help with when you do that.

Markdown is supported, so you can write inline code like `<div>this</div>` or multiline blocks of code in in triple backtick fences like this:

```
<script>
  function example() {
    element.innerHTML = "<div>code</div>";
  }
</script>
```