Grow your CSS skills. Land your dream job.

Last updated on:

Convert Accented Characters

For instance, if you want to use a string as part of a URL but need to make it safe for that kind of use.

function replace_accents($str) {
   $str = htmlentities($str, ENT_COMPAT, "UTF-8");
   $str = preg_replace('/&([a-zA-Z])(uml|acute|grave|circ|tilde);/','$1',$str);
   return html_entity_decode($str);
}

Comments

  1. Or you just use the functions that are already implemented in PHP ><

    
    $unsafe = 'Hello Daniël';
    $safe = urlencode($unsafe);
    $transfered = urldecode($_GET['data']);
    
    • the $unsafe value pulled through urlencode gives you this: “Hello%20Dani%C3%ABl”
      That are all valid characters inside a URL. It even works on so called multi byte characters, which are used in Korean, Japanese, Mandarin and similar languages. I found that out the hard way when some survey system started chopping up comments.

      The JavaScript equivalent is encodeURI / encodeURIComponent and to reverse that you use decodeURI / decodeURIComponent.

      
      var response, safe, unsafe = 'Hello Daniël';
      safe = encodeURIComponent(unsafe);
      response = decodeURIComponent(ajaxResponse);
      
  2. The go the other way around, when facing encoded characters, you might want to get the character that relates the most to an accented variant. E.g.: You want to store a song with DJ Tiësto in the title, but want it to turn out like DJ Tiesto when creating a filename in your script.

    
    function transform($title){
    	// Support ASCII list characters in encoded format
    	while(strpos($title,'&#')!==false){
    		$pointer = strpos($title,'&#');
    		$plength = 5;
    		$first = substr($title,0,$pointer);
    		$last = substr($title,$pointer+$plength);
    		$pnr = substr($title,$pointer+2,3);
    		$backstring = '';
    		for($i=0;$i<3;$i++){
    			if(!is_numeric($pnr[$i])){
    				$backstring.=$pnr[$i];
    				unset($pnr[$i]);				
    			}
    		}
    		$last = $backstring.$last;
    		$title = $first.htmlentities(chr($pnr)).$last;
    	}
    	
    	$title = str_replace(
    		array("ì","í","î","ï","ï","î","í","ì","ù"
    ,"ú","û","ü","ü","û","ú","ù","ç","ç","ì",
    "í","î","ï","ë","ê","é","è","è",
    "é","ê","ë","ë","à","á","â","ã","ä","å","à","á","â",
    "ã","ä","å",".","!",",",":",";","'","\"","ù","ú","û",
    "ü","ý","þ","ÿ","ù","ú","û","ü","ý",
    "þ","ÿ"),
    		array("i","i","i","i","i","i","i","i","u","u","u","u","u","u","u","u","c","c",
    "i","i","i","i","e","e","e","e","e","e","e","e","e","a","a","a","a","a","a","a",
    "a","a","a","a","a","","","","","","","","u","u","u","u","u","u","u","u","u","u",
    "u","u","u","u"),
    		$title
    	);
    	$title = str_replace(
    		array('\\','/',':','"','*','?','','|','+','%'),
    		'',
    		$title
    	);
    	while(strpos($title,'  ')!==false){
    		$title=str_replace('  ',' ',$title);
    	}
    	return $title;
    }
    
  3. the original snippet doesn’t cut it. Unicodes that aren’t covered by htmlentities() are ignored altogether. If you (safely) want to transform an UTF-8 string to alph-anumeric for the use in URLs, give urlify() a shot.

  4. Julien
    Permalink to comment#

    I have done this :

    function replace_accents($str) {
    $str = htmlentities($str);
    $str = preg_replace(‘/&([a-zA-Z])(uml|acute|grave|circ|tilde|cedil|elig|ring|th|slash|zlig|horn);/’,’$1′,$str);
    return html_entity_decode($str);
    }

  5. Julien
    Permalink to comment#


    function replace_accents($str) {
    $str = htmlentities($str);
    $str = preg_replace('/&([a-zA-Z])(uml|acute|grave|circ|tilde|cedil|elig|ring|th|slash|zlig|horn);/','$1',$str);
    return html_entity_decode($str);
    }

Leave a Comment

Posting Code

Markdown is supported in the comment area, so you can write inline code in backticks like `this` or multiline blocks of code in in triple backtick fences like this:

```
<div>Example code</div>
```

You don't need to escape code in backticks, Markdown does that for you. If anything screws up, contact us and we can fix it up for you.

*May or may not contain any actual "CSS" or "Tricks".