Treehouse: Grow your CSS skills. Land your dream job.

Last updated on:

Convert Accented Characters

For instance, if you want to use a string as part of a URL but need to make it safe for that kind of use.

function replace_accents($str) {
   $str = htmlentities($str, ENT_COMPAT, "UTF-8");
   $str = preg_replace('/&([a-zA-Z])(uml|acute|grave|circ|tilde);/','$1',$str);
   return html_entity_decode($str);
}

Comments

  1. Or you just use the functions that are already implemented in PHP ><

    
    $unsafe = 'Hello Daniël';
    $safe = urlencode($unsafe);
    $transfered = urldecode($_GET['data']);
    
    • the $unsafe value pulled through urlencode gives you this: “Hello%20Dani%C3%ABl”
      That are all valid characters inside a URL. It even works on so called multi byte characters, which are used in Korean, Japanese, Mandarin and similar languages. I found that out the hard way when some survey system started chopping up comments.

      The JavaScript equivalent is encodeURI / encodeURIComponent and to reverse that you use decodeURI / decodeURIComponent.

      
      var response, safe, unsafe = 'Hello Daniël';
      safe = encodeURIComponent(unsafe);
      response = decodeURIComponent(ajaxResponse);
      
  2. The go the other way around, when facing encoded characters, you might want to get the character that relates the most to an accented variant. E.g.: You want to store a song with DJ Tiësto in the title, but want it to turn out like DJ Tiesto when creating a filename in your script.

    
    function transform($title){
    	// Support ASCII list characters in encoded format
    	while(strpos($title,'&#')!==false){
    		$pointer = strpos($title,'&#');
    		$plength = 5;
    		$first = substr($title,0,$pointer);
    		$last = substr($title,$pointer+$plength);
    		$pnr = substr($title,$pointer+2,3);
    		$backstring = '';
    		for($i=0;$i<3;$i++){
    			if(!is_numeric($pnr[$i])){
    				$backstring.=$pnr[$i];
    				unset($pnr[$i]);				
    			}
    		}
    		$last = $backstring.$last;
    		$title = $first.htmlentities(chr($pnr)).$last;
    	}
    	
    	$title = str_replace(
    		array("ì","í","î","ï","ï","î","í","ì","ù"
    ,"ú","û","ü","ü","û","ú","ù","ç","ç","ì",
    "í","î","ï","ë","ê","é","è","è",
    "é","ê","ë","ë","à","á","â","ã","ä","å","à","á","â",
    "ã","ä","å",".","!",",",":",";","'","\"","ù","ú","û",
    "ü","ý","þ","ÿ","ù","ú","û","ü","ý",
    "þ","ÿ"),
    		array("i","i","i","i","i","i","i","i","u","u","u","u","u","u","u","u","c","c",
    "i","i","i","i","e","e","e","e","e","e","e","e","e","a","a","a","a","a","a","a",
    "a","a","a","a","a","","","","","","","","u","u","u","u","u","u","u","u","u","u",
    "u","u","u","u"),
    		$title
    	);
    	$title = str_replace(
    		array('\\','/',':','"','*','?','','|','+','%'),
    		'',
    		$title
    	);
    	while(strpos($title,'  ')!==false){
    		$title=str_replace('  ',' ',$title);
    	}
    	return $title;
    }
    
  3. the original snippet doesn’t cut it. Unicodes that aren’t covered by htmlentities() are ignored altogether. If you (safely) want to transform an UTF-8 string to alph-anumeric for the use in URLs, give urlify() a shot.

  4. Julien
    Permalink to comment#

    I have done this :

    function replace_accents($str) {
    $str = htmlentities($str);
    $str = preg_replace(‘/&([a-zA-Z])(uml|acute|grave|circ|tilde|cedil|elig|ring|th|slash|zlig|horn);/’,’$1′,$str);
    return html_entity_decode($str);
    }

  5. Julien
    Permalink to comment#


    function replace_accents($str) {
    $str = htmlentities($str);
    $str = preg_replace('/&([a-zA-Z])(uml|acute|grave|circ|tilde|cedil|elig|ring|th|slash|zlig|horn);/','$1',$str);
    return html_entity_decode($str);
    }

Leave a Comment

Posting Code

We highly encourage you to post problematic HTML/CSS/JavaScript over on CodePen and include the link in your post. It's much easier to see, understand, and help with when you do that.

Markdown is supported, so you can write inline code like `<div>this</div>` or multiline blocks of code in in triple backtick fences like this:

```
<script>
  function example() {
    element.innerHTML = "<div>code</div>";
  }
</script>
```