Software testing technology has incontestably undergone a tremendous development, with different software engineers playing significant roles. The technology dates back to centuries of continuous development, precisely commencing from 1980s when Gelperin and Hetzel, computer scientists, pioneered a modality of classification (Hetzel, 1990, “The Growth of Software Testing”). They categorized software testing development into phases and goals. The year 1956 was dominated by software debugging which made no clear distinction from software testing. Software debugging and testing went through enormous lab demonstrations between 1957 and 1978, leading to disjunction of software testing from debugging. This period was followed by destruction of software errors between 1979 and 1982. As from 1983 to 1987, there were series of software evaluations, with the aim of upholding quality in software technology. Software testing picked up in 1988 when the focus of software developers was to prevent possible failures and users` state of discontent (Hetzel et al, 1990).This paper delves into the processes involved in software testing and development of software testing from manual to automated. To help understand the technology behind software testing and its development, an elaborate literature review on previous research works has also been given a salient attention.


Software testing is the process of evaluating the condition of a software product and analyzing the result findings in attempts to make necessary improvements or changes (IEEE, 1990, “Standard Computer Dictionary”). The test “is done under controlled conditions which should include both the abnormal and the normal conditions. According to Black (2008), “it is the process of executing a program or system with the intent of finding errors”. The test can also be done “as the process of validating and verifying that a software program meets specific prerequisites” and works to the expectation of users (Yang, & Chao, 1995).

The testing process can be done manually or automated, depending on which method is preferred. Most tests will take place only when identifying specific test objectives and coding process completed (Dustin et al, 1999). This implies the method employed in carrying out the test is also governed by the methodology used in development of the software to be tested.

Software testing is often very expensive, and Automation is a better strategy to reduce time and cost. Software testing tools and techniques usually suffer from lack of generic applicability and scalability (Yang, & Chao, 1995). The reason is explicit. For automation to be possible, we must to have some means to generate oracles from the specification, and generate test cases to test the target software against the oracles to decide their correctness. Up to date, no fully scaled system has achieved this objective, since significant amount of human interventions are still needed in testing. The level of automation remains at the automated test script level (Dustin et al, 1999). 

Tests are declared to be clean or positive if aimed at validating the products. Validating that the software works for a particular case is the impediment. A finite number of tests can not validate that the software works for all situations. Since only one failed test is enough to prove that the software does not work, a countable number of tests cannot prove that the software works for all circumstances. The process of disapproving that software does not work is referred to as dirty or negative test. For a piece of software to survive a reasonable level of dirty tests, it must have adequate exception handling capabilities (Yang, & Chao, 1995).

  Software certainty has important connection with many aspects of software, including the architecture, and the amount of testing it has undergone. Based on an operational level, testing can function as a statistical sampling methodology to gain failure data for reliability estimation (Dustin et al, 1999). 

Software testing is not complete. It still remains an art since it has not been scientifically proven. Testing techniques are static, leading to the use of some old methods invented over 25 years ago, some of which are crafted or heuristics rather than good engineering methods. Software testing is not as expensive as not testing software, especially in places that human lives are in danger. Solving the turing halting problem is easier than solving the software testing problem. One cannot therefore be sure of the correctness of  apiece of software since no verification system can tell every correct program neither can we be certain that a verification system is correct (John, Philip, & David, 1988). 

There are a number of software testing ways and techniques and are categorized depending on the purpose and life cycle phases. Depending on purpose, correctness, performance, reliability and security testing are conducted. When categorized on basis of life cycle, it can be classified into, requirements, design program, evaluation, installation, and acceptance and maintenance phases testing.  

Correctness is the least requisite of software, the vital intention of testing. It needs some type of herald, to tell the accurate behavior from the erroneous one. The tester may or may not know the inside details of the software module under test. For instance, control flow, data flow, etc. For correctness, a white-box point of view or black-box point of view can be taken in testing software. 

Black-box testing mainly refers to functional testing, a testing method emphasized on executing the functions and examination of their input and output data. The tester treats the software under test as a black box, only the inputs, outputs and specification are visible, and the functionality is determined by observing the outputs to corresponding inputs. In testing, various inputs are exercised and the outputs are compared against specification to validate the correctness (John, Philip, & David, 1988). All test cases are derived from the specification. No implementation details of the code are considered. 

It is understandable that the more covered in the input space, the more problems are found, hence, confident about the quality of the software. However, exhaustive testing of the combinations of valid inputs will be impossible for most of the programs, not to mention invalid inputs, timing, sequence, and resource variables. Combinatorial explosion is the major barrier in functional testing. We can also never be sure whether the specification is correct or complete. 

According to Beizer (95), “threats of language barriers in which specifications are used ambiguity is often unavoidable”. Even if one uses some type of formal or restricted language, we may still fail to write down all the possible cases in the specification. Sometimes, the specification itself becomes an intractable problem, it is not possible to specify precisely every situation that can be encountered using limited words. People “can seldom specify clearly what they want” (Hetzel, 1988). What they want after they have been finished. Specification problems contribute to about 30 percent of all bugs in software.

A number of techniques are found in white-box testing. This is because the problem of intractability is eased by specific knowledge and attention on the structure of the software under test. The intention of exhausting some aspect of the software is still strong in white-box testing, and some degree of exhaustion can be achieved, such as executing each line of code at least once, cross every branch statements, or cover all the possible combinations of true and false condition predicate (Hetzel, 1988). 

As opposed to black-box testing, “software is viewed as a white-box or glass-box in white-box testing, as the structure and flow of the software under test are visible to the tester”. Testing plans are made according to the details of the software implementation, such as programming language, logic, and styles. Test cases are derived from the program structure. 

Loop control-flow, and data-flow testing, all maps the corresponding flow structure of the software into a directed graph. Test cases are carefully selected based on the criterion that all the nodes or paths are covered or traversed at least once. By doing so we may discover “unnecessary dead code”, code that is at some extent of no use, or never get executed at all, that which can not be discovered by functional tests (Hamlet, 1994). 

In mutation testing, the original program code is perturbed and many mutated programs are created, each contains one fault. Each faulty version of the program is called a mutant. Test data are selected based on the effectiveness of failing the mutants. The more mutants a test case can kill, the better the test case is considered. The problem with mutation testing is that it is too computationally expensive to use. Boundary between “black-box approach and white-box approach” is however not a clear-cut (Hetzel, 1988). Many testing strategies mentioned above, may not be safely classified into black-box testing or white-box testing. It is also true for “transaction-flow testing, syntax testing, finite-state testing, and many other testing strategies not discussed in this text”. One reason is that all the above techniques will need some knowledge of the specification of the software under test. Another reason is that “the idea of specification itself is broad”, it may contain any requirement including the structure, programming language, and programming style as part of the specification content (Hetzel, 1988). 

We may be reluctant to consider random testing as a testing technique. The test case selection is simple and straightforward: they are randomly chosen. Lab research has shown that random testing is more cost effective for many programs. Some very subtle errors can be discovered with low cost. And it is also not inferior in coverage than other carefully designed testing techniques. One can also obtain reliability estimate using random testing results based on operational profiles. Effectively combining random testing with other testing techniques may yield more powerful and cost-effective testing strategies. 

Not all software systems have specifications on how they work; instead each system has inward performance requirements. The software must not take unlimited time and resource should not take infinite time or resource to execute. At times, "Performance bugs are utilized to refer to design problems in software that cause the system performance to deteriorate” (Hetzel, 1988). 

The possibility of failure-free operation of a system is known as software reliability. It is related to many characteristics of software that includes the testing process. Directly estimating software reliability by quantifying its related factors can be tiresome. Automated testing is an appropriate sampling method to measure software credibility. By the use of the operational level, “software testing can be used to detect failure data”, and an estimation model can further be used to analyze the data to estimate the current reliability and predict future believability. Hence, depending on the estimation, the engineers decide whether to release the software, while the users can decide whether to install and use the software. 

Robustness of software components is in “its ability to function correctly in the presence of excessive inputs or stressful environmental conditions” (James, & Bret, 2001). In robustness testing, functional correctness of software is not of concern while incorrectness testing it is mandatory. Robustness testing only checks for robustness problems such as machine crashes, process hangs or abnormal termination. For instance, “the oracle is comparatively simple, as it can be made more portable and can be expanded more than correctness testing” (James, & Bret, 2001). Stress testing, or load testing, is often used to test the whole system rather than the software alone. In such tests the software or system are exercised with or beyond the specified limits. Typical stress includes resource exhaustion, bursts of activities, and sustained high loads. 

Software quality, reliability and security are highly dependent. Flaws in software can enable hackers and crackers to eavesdrop and open security holes (John, Philip, & David, 1988). Many critical software applications and services have complex security measures against malicious attacks. Objective of security testing of these systems include identifying and removing software flaws that may potentially lead to security breach, and validating the robustness of security measures. Simulated security attacks can as well be performed to find susceptibility. Testing is potentially endless. We can only test when all the defects are revealed and removed (John, Philip, & David, 1988). At some point, we have to stop testing and ship the software. The question is when. 

Realistically, testing depends on finance, time and quality. It is determined by profit models. The discouraging and unfortunately most frequently used approach is to stop testing whenever any of the allocated resources are exhausted. Automation of the testing pores thus help reduces on all the expenses incurred. The optimistic stopping rule is to stop testing when either reliability meets the requirement, or the benefit from continuing testing cannot justify the testing cost. This often requires the use of reliability models to evaluate and predict reliability of the software under test. Each evaluation requires “recurrent running of the following cycle, failure data gathering, modeling -prediction”. The method does not fit well for “hyper-dependable systems” given that the actual field failure data will take too long to accumulate (James, & Bret, 2001). 

Literature Review

It is prudent for all programming analysts to have an idea on what software engineering is all about. The concept of software engineering if well understood will makes an analyst the best on his or her job.

The hands on knowledge on software engineering is basically found in the four phase past reports of the use of computers. These are in the 1950s, early and mid 1960s, early and mid 1970s, and finally early and mid 1980s, up to the present (Hetzel, 1988).

1960s saw the computing industry experience some major issues. The problems experienced during this era of computing could stem from several human, environmental, and economic factors. The human factor was due to software programming being more and more prominent in daily human activities. This led to establishment of machine at a speed which the programmers found hard to cope with (Hetzel, 1988). Environmental and economic factors resulted from the pressure put on programming infrastructure and the monetary allocation for the computer programming effort (James, & Bret, 2001).

The turn of the century has seen these problems sticking to the human race like a cancer. And like the cancer the problem has grown with the growth of programming culture taking a more current form of past issues. During system development life cycle, there are several considerations that should be taken into account by the system analyst and the major stakeholders, with the aim to adopt to the new system (Hetzel, 1988). These considerations are in the areas of overall cost of implementing a new system, the reviews on how the system will function and the problems likely to occur during implementation of the new system (James, & Bret, 2001). 

Once the stakeholders realize the importance of looking reading the small print on manuals and worrying about every detail, the problem will be taken care of. It is important for them to be clear on how the system works and makes this known to the various players in the industry (Dustin, et al, 1999). This will not only minimize the problems occurring in the life cycle but will also create a clear understanding by all who are involved to erase any misconceptions and over expectations placed on the system and its developers (Dustin, et al, 1999).

There is complexity in understanding how a system works without placing it in the correct concept and perspective (Hetzel, 1988). This is the same deal with all forms of software products which are best understood when viewed in the correct mirror. This will in turn improve an understanding on what drives them and possibly minimize some hitches that they cause (Hetzel, 1988).

This brings us to the conclusion that a little organization in the overall system process will go a long way to improve the general picture. The dynamic nature brings into play various environmental and human factors surrounding the process of system development which need to be dealt with (Philip et al, 1997). In the long run, obtaining the most optimal system possible will require a number of considerations to be made in several areas (Philip et al, 1997).

Once you have a full grasp on the issues above, you can understand that software testing is really the backbone of this process. This is because it is a crucial factor on determining the feasibility of the entire project.

Software testing which a complex aspect can mean different things to different people. To come up with an overall view will take us understanding the basic factor of software testing. This factor is testing itself (Yang. & Chao, 1995).

All along in the generations of man, there have been various views on the aspect of testing. A good view for this case is by D. Gelperin und W.C. Hetzel in 1988 that made their own conclusions on maters of tasting that are practical (Yang. & Chao, 1995). The five period’s analysis runs from 1956 which saw no difference between testing and removing bugs and defects (Philip et al, 1997). This period generally focused on the processes involved with removal of defects from a system, paving way to a period from 1957 to 1978 that was characterized by establishment of the requirements stage of a new system lifecycle (James, & Bret, 2001). The requirements had to meet the demands of the new system and its stakeholders. It was from the successful differentiation of the processes involved in testing and bug removal (Philip et al, 1997). 

The period between 1971 and 1982 saw a real focus on debugging. The period was called destruction oriented period. 1983 to 1987 witnessed the analysis move to focusing on product details, providing an overview of what was expected from the product being developed. Finally, 1988 was the period when caution was the driving force where software developers considered dealing with problems issues before they could occur (Hetzel, 1988). 

Programmers also needed to ensure that coding system produced the results expected. They made it possible through software analyst, to make sure the codes were functional by making them meet a required set of conditions. Details on unit testing can be found in a 2006 survey for TDD which however is not considered the determinant of this factor (Hetzel, 1988).

Another TDD method can be applicable in extreme programming which is agile. There are other methods like V-Model or Rational Unified Process (RUP), which are different from agile method in their performance levels which are higher. XP upgrade has seen improvement on communication, interaction and satisfaction of both parties in system development due to simplicity (Hetzel, 1988).

Test frameworks like Junits are important in unit testing which is important for specification. These call for the system to be functioning at all times with the least occurrence of dysfunctions (Hetzel, 1988).

The third mode is the use of team pairs. People work in pairs so that any instance overlooked by one person is noticed by their partner on the assignment. The error rates are further minimized by continuous swapping of partners to ensure optimal output (John, Philip, & David, 1988).  This is called pair programming which although initially expensive proves to save on a lot of resources in the long run.

Unlike BDD, TDD focuses on the tests and on overall outlook of the program. How a person thinks is influenced by language. BDD uses this aspect to build on TDD. This moves the focus from what it is to how it functions (John, Philip, & David, 1988). 

Software testing is more complicated toward attaining better quality. Using testing to locate and correct software defects can be an infinite process. Bugs cannot be completely removed. Just as the complexity barrier indicates:  Testing and fixing problems may not necessarily improve the quality and reliability of the software. At times it may introduce much more severe problems into the system, happened after bug fixes, such as the telephone outage in California and eastern seaboard in 1991. The disaster happened after changing 3 lines of code in the signaling system (John, Philip, & David, 1988). 

Using formal methods to "prove" the correctness of software is also an attracting research direction. But it can not surmount the complexity barrier either and only work well for relatively simple software. It does not scale well to those complex, full-fledged large software systems, which are more vulnerable to error (Hetzel, 1988). 

In a broader view, the utmost purpose of testing must be questioned. Whether there are effective testing methods anyway, and since finding defects and removing those does not necessarily lead to better quality. An analogy of the problem is like the car manufacturing process. In the craftsmanship epoch, we make cars and hack away the problems and defects (Hetzel, 1988). But such methods were washed away by the good quality engineering process, which makes the car defect-free in the manufacturing phase. Bug, that resulted in incorrect indicators of signal strength in the phone's interface. Reportedly customers had been complaining about the problem for several years. The company provided a fix for the problem several weeks later (John, Philip, & David, 1988). 

Email services of a major smart phone system were interrupted or unavailable for nine hours in December 2009, the second service interruption within a week, according to news reports. The problems were believed to be due to bugs in new versions of the email system software. It was reported in August 2009 that a large suburban school district introduced a new computer system that was 'plagued with bugs' and resulted in many students starting the school year without schedules or with incorrect schedules, and many problems with grades. Upset students and parents started a social networking site for sharing complaints.

In February of 2009, users of a major search engine site were prevented from clicking through to sites listed in search results for part of a day. It was reportedly “due to software that did not effectively handle a mistakenly placed in an internal ancillary reference file” that was frequently updated for use by the search engine (Hetzel, 1988). Users, instead of being able to click through to listed sites; they were redirected to an intermediary site which, as a result of the suddenly enormous load, was rendered unusable. A large health insurance company was allegedly banned by regulators from selling certain types of insurance policies following the ongoing computer system problems that resulted in denial of coverage for needed medications and mistaken overcharging or cancellation of benefits. The regulatory agency was quoted as stating that the problems were posing "a serious threat to the health and safety of beneficiaries” (John, Philip, & David, 1988).

A news report in January 2009 indicated, “a major IT and management consulting company was still battling years of problems in implementing its own internal accounting systems” (Dustin et al, 1999). 

In August , 2008 “it was reported that more than 600 U.S. airline flights were significantly delayed due to a software glitch in the U.S. FAA air traffic control system” (Dustin et al, 1999). The problem was claimed to be a 'packet switch' that 'failed due to a database mismatch', and occurred in the part of the system that handles required flight plans (Hetzel, 1988). 

A major clothing retailer was reportedly hit with significant software and system problems when attempting to upgrade their online retailing systems in June 2008. Problems remained ongoing for some time. When the company made their public quarterly financial report, the software and system problems were claimed as the cause of the poor financial results (Hetzel, 1988).

Software problems in the automated baggage sorting system of a major airport in February 2008 prevented thousands of passengers from checking baggage for their flights (Hetzel, 1988). It was reported that the breakdown occurred during a software upgrade, despite pre-testing of the software. The system continued to have problems in subsequent months (Dustin et al, 1999).

News reports in December of 2007 indicated that significant software problems were continuing to occur in a new ERP payroll system for a large urban school system. It was believed that more than one third of employees had received incorrect paychecks at various times since the new system went live the preceding January, “resulting in overpayments of $53 million” (Dustin et al, 1999). 

An employees' union brought a lawsuit against the school system, “the cost of the ERP system was expected to rise by 40% and the non-payroll part of the ERP system was delayed” (Dustin et al, 1999). Inadequate testing reportedly contributed to the problems. The school system was still working on cleaning up the aftermath of the problems in December 2009, going so far as to bring lawsuits against some employees to get them to return overpayments (Dustin et al, 1999). 

In November of 2007, a regional government brought a multi-million dollar lawsuit against a software services vendor, “claiming that the vendor minimized quality in delivering software for a large criminal justice information system” and the system did not meet requirements. The vendor also sued its subcontractor on the project (Yang, & Chao, 1995).

In June of 2007 news reports revealed that software flaws in “a popular online stock-picking contest” could be used to gain an unfair advantage in “pursuit of the game's large cash prizes”. Outside investigators were called in and in July the contest winner was announced. According to the report, the winner had previously been in 6th place, indicating that the top 5 contestants may have been disqualified (Yang, & Chao, 1995).

A software problem contributed to a rail car fire in a major underground metro system in April of 2007.The software reportedly failed to perform as expected in detecting and preventing excess power usage in equipment on new passenger rail cars, resulting in overheating and fire in the rail car, and evacuation and shutdown of part of the system (John, Philip, & David, 1988).

News reports in May of 2006 described a multi-million dollar lawsuit settlement paid by a healthcare software vendor to one of its customers. It was reported that the customer claimed there were problems with the software they had contracted for, including poor integration of software modules, and problems that resulted in missing or incorrect data used by medical personnel (James, & Bret, 2001).

A newspaper article reported “major hybrid car manufacturer had to install a software fix on 20,000 vehicles due to problems with invalid engine warning lights and occasional stalling” (James, & Bret, 2001). In the article, “an automotive software specialist indicated that the automobile industry spends $2 billion to $3 billion per year fixing software problems” (James, & Bret, 2001).

Media reports in January of 2005 detailed “severe problems with a $170 million high-profile U.S. government IT systems project”. Software testing was one of the five major problem areas according to a report of the commission reviewing the project (James, & Bret, 2001). 

Research Methodology

In the software industries, the utilization of implicit and explicit ratings as a research methodology is common and obvious to the scientific researchers, just like grading system is necessary in the learning institutions to evaluate the students` performance records. The caliber of Alton Scheid provided an elaborative literature in this particular faculty of computer science (Anick, 2003).

Ratings done on a given scale enables researchers to make precise judgments and come up with statistically processed figures that can then be used in assessing a situation in question. In the field of computer and Information technologies, software engineering, the most predominant of the scientific research methodologies are the explicit and implicit approaches (Morita & Shinoda, 1994).

Implicit Rating

According to Oard and Kim (1996), implicit feedback techniques are effective in obtaining information on the Users` behaviors, which are important in determining the preferences and interests of different users of the software. They, Oard and Kim, grouped the observable characteristics of users as minimal scope and behavior category, in which case, the minimal scope or class represented the smallest feasible segment of the item that was being executed by the user, while the behavior category referred to the examination and annotation of users’ behaviors (Oard and Kim, 1996),.

Explicit Rating.

Unlike the implicit rating methodology, explicit approach is more like a field questionnaire, engaging the physicians directly in giving information about them to be used in analyzing their preferences. Oard and Marchionini Oard & Marchionini, (1996) emphasized that even though explicit rating system is also very important, it is prone to biases and data inaccuracy. For this reason, they proposed that both the implicit and explicit data rating systems should collaboratively be used to improve on feedback quality.

In line with the researchers arguments, the caliber of Oard, research on software engineering should exploit exclusively both the implicit and the explicit methodologies of research in obtaining data and information from end users, as the feedbacks are likely to shade more light on the possible way forward that would help develop the software technology  and improve success for physicians and practitioners whose interests and demands for information variety are rapidly increasing with time (Morita & Shinoda, 1994).

Other Strategies

The preceding research literatures can also be employed in endeavor to acquisition of the most accurate data. These are inclusive of the following previous researches providing a group of varied indicators of users’ interests both explicitly and implicitly, in attempt to address the question of how the user behaviors can be utilized as the implicit measures of their interests.

With the aid of other scientists, Morita & Shinoda (1994) experimented the users` reading time to automatically re-rank sentence-based synopses from the documents retrieved by the users. Performances from the implicit system were also considered in the research. From the work of Morita and Shinoda (1994), they explored on the behaviors of users through assessment of newsgroup articles which to them could be utilized as implicit results necessary for the acquisition of the users` profiles and information filtering (Morita & Shinoda, 1994).

Research Findings

There are different models in software development life cycle. These are inclusive of the waterfall model and the spiral model (Hetzel, 1988). Though the models tend to differ, they basically operate on certain common principles clustered into a number of phases. The stage of software development requirements is essentially characterized by software testing processes (Hetzel, 1988). In all the models of software development cycle, testing tend to come in initial stages , first to ensure the final product in the system development life cycle is up to the task it was intended. AdaTEST and cantata are useful products for testing software performance and very useful in the software testing stages. These are both products of ISO which was endorsed by ISO9001 in 1988 (Smith, 1990).

Software requirements stage involves extensive software testing. Software testing is done to ensure the program being developed has no major hiccups during implementation stage (Philip, John, Christopher, & Daniel, 1997). It is done by exposing the program to conditions that are likely to result in poor performance of the program then finding and eliminating errors. After software testing, the stakeholder, in this case, the end users of the new software product can feel assured that the product they are using is of the best performance level. The test is thus conducted to ensure that customer contentment in the new software product or service is upheld (Philip et al, 1997).

When software testing is not performed with care and caution, the new system can be susceptible to bugs. Bug is a jargon used mostly by software scientists when referring to abnormal features detected in a software system and which can cause errors in the performance of the software programs, leading to end user dissatisfactions in using a program (Gelperin, 1988, & Smith, 1990).

In order to avoid bugs and system failure, the programmers set out to discover all possible hitches, though not every bit of error is tracked down. Discovering the technical hitches in the system encompass a theoretical analogy of the new system to be implemented by either a perfect system or a system that has been used before and which was successfully utilized by end users (Cornett, 1996). With these two key comparisons, the programmer can detect what will possibly work in the system when implemented and that which is prone to technical failures which may cause the program to perform below expectations (Cornett, 1996, & Beizer, 1990). A theoretically perfect system will give the stakeholders a general overview of how the system is expected to run. This will be reflected on how the program is actually functioning. A former system that reached the required standards is necessary to act as a frame of reference by giving information on what can work and what is not practically executable (Cornett, 1996, & Beizer, 1990).

Apart from discovering the possible system errors that could occur after system implementation, software testing has another critical function (Yang. & Chao, 1995). This is to determine the usefulness of the system to the program users once implemented. The end users of the new system have unique needs which must be addressed during software testing to ensure their satisfaction. Software testing will prove that given software is customized to meet the requirements of a security firm if this happens to be the end user, or the unique needs of a student using the system in college (Yang. & Chao, 1995).

The automated software testing came into place to overtake the manual testing routine. Research findings have shown that even though is also preferred, as it also tracks down many software defects the same way the modern automated test can do, it is devious and time consuming. Automating the test is purposed to speed up the process in real time technology (Beizer, 1990).The automated software testing technology has more so been preferred for testing software products with long lasting lifespan. As from the 1983, there were series of software evaluations, with the aim of upholding quality in software technology. Software testing picked up in 1988 when the focus of software developers was to prevent possible failures and users` state of discontent (Hetzel et al, 1990). This goal and motive paved ways for the development of automated software testing. The work presented here re-investigates into the fundamentals and scientific principles surrounding software testing technology. To help understand the technology behind software testing and its development, an elaborate literature review on previous research works has also been given a salient attention.

Discussions and Recommendations

It’s a common knowledge that software programs operate in computers. In the current high tech world, almost every system is run to some extent by computers. This is from the department of traffic control, to spaceships, and from the students studying medicine to a patient on a life support machine (Hamlet, 1994). The importance of software testing can therefore not be overemphasized. A small problem in the system as a result of bugs can lead to devastating effects on the economy of a country and even cost some people their lives (Hamlet, 1994).

With this in mind quality becomes a major component of software testing. Because the programmers are prone to human error, a quality system can be obtained after several trials and errors by the process of continuous debugging. Apart from debugging, a system is to be tested for its compatibility. This is when the programmers and the various stakeholders sit down to chose a system that works best for them. This will be done by considering various human and environmental factors that the organization is most comfortable with. The organization should also in the process choose from a collection of models for a new system the model that will best meet their needs with the least amount of defects (James, & Bret, 2001). 

To meet these quality check requirements, a step by step software testing is necessary. The stages in software development which is analyzed in the system life cycle will determine the stages in software testing (John, Philip, & David, 1988). The first stage will be determining requirements and ensuring they meet the stipulated needs for which they are required. It should be noted that system development life cycle goes hand in hand with product cycle. The software is only part of the entire product which makes up a system with several parts (John, Philip, & David, 1988).

All models for system development comprise a point of software testing. Some of these models include the V model and the waterfall model. All models are divided into several stages. These stages include the software requirements stage; software design stage; software implementation stage; software verification stage and finally maintenance. Software testing is done in nearly every stage of system development (Dustin, et al, 1999).

After the software has gone through the entire software development life cycle, it joins the overall system. This overall system now forms the end product of system development life cycle (John, Philip, & David, 1988).

Before system development begins, stakeholders need to take note of the financial requirements. The bulk of the monetary input in the system development life cycle goes to the software testing stage (Cornett, 1996). This is because, due to quality requirements, the testing process is done over and again. A form of repeated testing is called regression testing. Regression testing considerably cuts the cost of testing because it is less risky (Dustin, et al, 1999). Regression testing involves going through the prior tests that produced good results. This is to check on the likelihood that an error occurring is due to a recent change made and trying to correct the problem. Automating the software test made the process even simple and less time consuming (Dustin, et al, 1999).

All processes of software testing should be done the soonest time possible upon the onset of a system development life cycle. The earlier the software checking begins, the better the system becomes in fulfilling its intended functions (Black, 2008).

System development life cycle models such as V model and waterfall model are quite delicate. The two models have to start on successfully and throughout the process, caution should be taken to avoid any coincidental errors, otherwise the entire system may malfunction (James, & Bret, 2001, & Dustin, et al, 1999).

For all models involved in the system development life cycle, automated testing is very crucial. This will lead to system checking being done repeatedly. To make the checking easier, AdaTEST and cantata have proved to be very useful (James, & Bret, 2001).

As noted earlier, bugs are small or major defects that can cause a real havoc in a system. It is also not possible to detect all the bugs found in a given system. The simplest system can take several years of debugging which is not practical oriented (Kaner, 2006). This is because most systems are extremely complex and thorough debugging can run into millions of years for the program to be 100% effective (Mark, & Dorothy, 2008). System defects occur mostly because of human error which should not be mistaken for the programmers being careless. Software programs and applications are extremely complex which makes the bugs presented by them also equally complex. To add to the complexity is the fact that a system is operating under environmental influence. Human factors, structural and infrastructure all play a unique role in speeding or slowing down the process of debugging (Kaner, 2006, & Oard, & Kim, 2001).

Most software are digital systems which tend to differ remarkable from the physical system. While the physical system is rigid, the software system is often extremely malleable and delicate. In other words, it is much easier to find a problem in a physical system because one can easily predict the cause and source of the problem than to do the same in software system (James, & Bret, 2001). Due to this complexity, a system analyst cannot set out to discover all the possible causes of errors and malfunction, as this will take them a lot of time, even a lifetime of no success. These constitute some of the challenges and risk factors in implementing a new system; there is no total assurance of perfect performance satisfaction to all (James, & Bret, 2001). 

Another difficulty arising from software automated testing is because of the software systems being completely dynamic (Jack, & Nguyen, 1999).  A software system can have a myriad of problems from a single cause or a single problem from several causes which are hard to track down. This causes two tests done on a system at different times, yielding differing results. The aspect change can be affecting several functions in a system and these in turn changes when it is affected. On the other hand, the single complication can be responsible for affecting various features in the system, making it hard to point out the source and destination of the problem (James, & Bret, 2001).

A good illustration for the complexity and the dynamic nature of a software system is the pesticide paradox. One can be having a plantation that is attacked by different types of pests. The person does not have the time or the patience to look carefully into each pest and find out if they are similar or difference. With the impatience to get rid of the pests, they apply a pesticide which can only harm and kill some pests while others are either immune to the pesticide or develop some sort of resistance, which making the job more complex with each attempt to solve the issue. The final analysis is that this system can only be able to escape into safety by taking time to find out exactly which pests are nuisances and the trying to eradicate them. This will take time, patience and efforts.

Clients want a system that will meet their unique day to day needs, and if there has to be problems to be incurred while using the system, they have to be as minimal as possible. Software automated testing was developed to make this process of incessantly reducing software problems easier and very fast (Black, 2008).

Software complexity, “the complexity of current software applications” can be difficult to comprehend for anyone without experience in modern day software development. A “multi-tier distributed systems”, applications utilizing multiple local and remote web services, data communications, enormous relational databases, security complexities, and sheer size of applications have all contributed to the exponential growth in software complexity (Black, 2008). 

In changing requirements, the end-user may not understand the effects of changes. If there are many minor changes or any major changes, known and unknown dependencies among parts of the project are likely to interact and cause problems, and the complexity of coordinating changes may result in errors. Enthusiasm of engineering staff may be affected. In “some fast-changing business environments”, continuously modified requirements may be a fact of life. In this case, management must understand the resulting risks, and QA and test engineers must adapt and plan for continuous extensive testing to keep the inevitable bugs from running out of control. Automating the tests is indispensably the best way to go.


Anick, P. “Using terminological feedback for search refinement”: A log-based study. In: SIGIR ’03. “Proceedings of the 26th annual international ACM SIGIR 

Conference on Research and development in information retrieval”. New York,  NY, USA, ACM Press. 2003.

Beizer, B. “Software Testing Techniques”.2nd Ed. New York: Van Nostrand Reinhold. . 1990. 21-430. 

Black, R. “Advanced Software Testing, Vol 2”. “Guide to the ISTQB Advanced  Certification as an Advanced Test Manager”. Santa Barbara: Rocky Nook  Publisher. 2008.

Cornett, S. “Code Coverage Analysis”. USA: Carnegie Mellon University. 1996.

Dustin E, et al.: “Automated Software Testing”. Addison Wesley, 1999.

Gelperin, D. "The Growth of Software Testing". CACM 31 (6). 1988. ISSN 0001-0782. 

Harry, K.N, & Timothy, L.S. “Towards target-level testing and debugging tools for Embedded software”. Conference proceedings on TRI-Ada, 93. 1993. 288-124.

Hamlet, D. “Foundations of software testing: dependability theory”. “Proceedings of the Second ACM SIGSOFT symposium on Foundations of software engineering”. 1994. 128 – 139. 

Hetzel, B.). "The Growth of Software Testing". CACM 31 (6). 1988. ISSN 0001-0782. 

IEEE.  “IEEE Standard Computer Dictionary, a Compilation of IEEE Standard Computer Glossaries”. New York: IEEE.1990. 

Jack F.H, & Nguyen, H.Q. “Testing Computer Software”.2nd Ed. New York: John Wiley and Sons, Inc.1999. 480. 

James, B & Bret, P.  “Lessons Learned in Software Testing, A Context-Driven Approach”. Wiley’s. 2001.13- 4. 

John, D.V, Philip K.M, & David G.D. “The Ballista Software Robustness Testing Service”. Proceedings of TCS’99. Washington DC.1988.

Kaner, C. “Quality Assurance Institute Worldwide Annual Software Testing Conference”. Florida Institute of Technology. Orlando, FL, November 2006. 

Kropp, N. P.; Koopman, P. J.; Siewiorek, D. P. Automated robustness testing of off-the-Shelf software components. Twenty-eighth Annual International Symposium on 

Fault-Tolerant Computing .Cat. No.98CB36224.Morita, M.F, & Shinoda, Y.P. “Information filtering based on user behavior Analysis and best match text retrieval”. In Proceedings of the 17th Annual 

International ACM SIGIR Conference on Research and Development in Information, Ireland. 1994. 272-281.

Oard, D. W., and Kim, J. “Modeling information content using observable Behavior”. Proceedings of the 64th Annual Meeting of the American Society for Information Science and Technology, USA, 2001. 38-45.

Philip K.F, John S.L, Christopher D.R, & Daniel S.J. “Comparing Operating Systems Using Robustness Benchmarks”. 16th IEEE Symposium on Reliable Distributed  Systems. Durham, NC: Oct 22-24. 1997. 72-79.

Mark, F & Dorothy, G.S.  “Software Test Automation”. ACM Press - Addison-Wesley. 1999.Savenkov, R. “How to Become a Software Tester”. Roman Savenkov Consulting.  2008.159. 

Robert, V. “Testing Object-Oriented Systems, Objects, Patterns, and Tools”. Addison  Wesley Professional. 1999. 45-57. 

Smith, C. “Performance Engineering of Software Systems”. Addison-Wesley. 1990.

Yang, M.K. & Chao, AP. “Reliability-estimation and stopping-rules for software testing, Based on repeated appearances of bugs”. IEEE Transactions on Reliability, vo.2. 

1995. 315-21,